Steps to go through to find a solution
- Understand the problem
- Think about constraints, assumptions, and requirements
- Break the problem into subproblems (what algo can solve this kind of problem)
- Come up with simple examples and try to find a pattern in the way itâs being solved
Asymptotic complexity
T(n) = aT(n/b) + O(n^d) where:
- a is the number of subproblems,
- b is the factor by which the problem size is reduced,
- d represents the exponent of the additional work done outside the recursive calls. Compare  vs. (a.k.a:  vs. ):
- Case 1: If , then recursion dominates: T(n) =
- Case 2: If , then non-recursive work dominates: T(n) = O(n^d)
- Case 3: If , then they are balanced, and we get an extra logarithmic factor: T(n) =
Problem solving with MergeSort
Count inversions (a, b) with a â A and b â B, assuming A and B are sorted.
- Scan A and B from left to right.
- Compare ai and bj.
- If ai < bj, then ai is not inverted with any element left in B.
- If ai > bj, then bj is inverted with every element left in A.
- Append smaller element to sorted list C.
How to give a greedy algo
- Give the heuristic: We state the greedy choice, which is also called the greedy heuristic. A well-designed greedy algorithm is often phrased in just 2â3 sentences (see Lectures).
- Show the algorithm produces a valid output: We show that our greedy heuristic produces a valid solution. If the heuristic is well-phrased, this is often straightforward and often only takes 1â3 sentences.
- Argue that the greedy heuristic produces an optimal solution: We of course need to argue that the algorithm we proposed actually works. An algorithm works if it outputs an op-timal solution on every valid input. We thus need to show that the value of the solution produced by our algorithm is no worse than the value of any valid solution. A canonical, clean and insightful argumentation for this is typically a proof by contradiction.
- Give the complexity of the algorithm: This is typically straightforward and often only takes 2â3 sentences. Typical greedy algorithms can clearly be implemented in a polynomial number of steps, and often that suffices in terms of our analysis of the complexity.
Greedy justification
- Let vmin be a vertex of the smallest degree dmin in G. It is never a bad choice to let vmin be labelled 1.
- The reason being that some vertex in G must be labelled by 1, and its outdegree in the directed graph ËG will be its degree in G, which is at least dmin. After removing vmin, we have a subgraph of G, and the value of a subgraph of a graph H is never worse than the value of the graph H itself. thought process:
- State your greedy heuristic in a strong manner that shows that every decision we take is the best decision we can possibly take
- clear argument on why your single choice is correct
- âOur greedy algo picks a choice, and that choice picks the optimal solution, so there is no solution that is better than our choice
- âThe green and purple edges jointly constitute a MST T . The green edge g11 costs at least as much as the red edge g8, since the red edge is the greedy choice, and our greedy algorithm picks a cheapest edge. If we swap edge g11 out for g8, we have a spanning tree Tâ˛, and the cost of TⲠis no more than the cost of T . So TⲠmust also be a MSTâ
C. Dynamic Programming (DP)
How to Formulate a DP Solution:
- Identify Overlapping Subproblems:
⢠If a recursive solution recomputes the same values, DP is likely applicable.
- Define the State:
⢠Determine what each DP table entry (or memoized function) represents. Example: For knapsack, dp[i][w] might represent the maximum value achievable with the first i items and a capacity w.
- Develop the Recurrence:
⢠Express the solution to a problem in terms of solutions to smaller subproblems.
- Determine Base Cases:
⢠Clearly define the simplest instances (e.g., dp[0][*] = 0).
- Choose an Implementation Strategy:
⢠Top-Down (Memoization): Recursively compute values and cache results. ⢠Bottom-Up (Tabulation): Iteratively fill in a DP table.
- Reconstruct the Solution (if necessary):
⢠Use auxiliary pointers or backtracking through your DP table.
Common DP Problems & Approaches:
⢠Knapsack Problem: State: dp[i][w] = maximum value using items 1 to i with capacity w. Recurrence: dp[i][w] = max(dp[i-1][w], value[i] + dp[i-1][w-weight[i]]).
⢠Longest Common Subsequence (LCS): State: dp[i][j] = length of LCS for prefixes X[1..i] and Y[1..j]. Recurrence: If X[i] == Y[j] then dp[i][j] = 1 + dp[i-1][j-1] else dp[i][j] = max(dp[i-1][j], dp[i][j-1]).
⢠Longest Increasing Subsequence (LIS): State: dp[i] = length of the longest increasing subsequence ending at index i. Recurrence: dp[i] = 1 + max{ dp[j] : j < i and array[j] < array[i] }.
LSS
In many divideâandâconquer (or segment tree) solutions to the Largest Subarray Sum problem, the recursive function returns a tuple of four valuesâoften labeled as , , , and âeach of which encapsulates important information about the subarray. Hereâs what each one typically represents:
⢠: The Maximum Subarray Sum
This is the best (largest) sum obtainable by any contiguous subarray within the current segment.
⢠: The Maximum Prefix Sum
This is the largest sum of a subarray that starts at the beginning of the segment. In other words, itâs the best sum you can get by taking a prefix of the current segment.
⢠: The Maximum Suffix Sum
This is the largest sum of a subarray that ends at the end of the segment. It represents the best sum obtainable by taking a suffix of the segment.
⢠: The Total Sum of the Segment
This is simply the sum of all the elements in the current segment.
Why These Four Values?
When you split the array into two halves, these four values allow you to efficiently merge the solutions from the left and right segments. For instance, if you have two segments with tuples and , you can combine them as follows:
⢠Total Sum:
⢠Maximum Prefix Sum:
⢠Maximum Suffix Sum:
⢠Maximum Subarray Sum:
This merging ensures that every possible subarray (entirely in the left segment, entirely in the right segment, or crossing the midpoint) is considered.
In Summary
⢠is the overall best (largest) subarray sum in the segment.
⢠is the best sum from the start of the segment to some position.
⢠is the best sum from some position to the end of the segment.
⢠is the total sum of the segment.
These values are crucial for building an efficient recursive (or segment tree) solution to the Maximum Subarray Sum problem.
Below is a concise, high-level explanation of how one might reason about the problemâwithout revealing every intermediate stepâso you can see the main ideas that lead to the solution.
1. Realizing There Can Be Only a Few Frequent Elements
-
Key Insight: At most 6 frequent elements
An integer x is called frequent if it appears at least ân/6â times in the sorted array A[1..n]. Observe that you cannot have more than 6 distinct frequent elements. Indeed, if you had 7 different elements each appearing ân/6â times, you would exceed nnn. Thus, at most 6 elements can qualify. -
Why sampling works
Because AAA is sorted, any frequent element x forms a contiguous block of at least ân/6â identical values. Such a block must âcrossâ one of a small number of carefully chosen positions. -
Key Observation: An element is âfrequentâ if it appears at least ân/6â times in the array of length nn.
-
If you tried to have 7 or more distinct elements each meeting that threshold, you would exceed nn. Therefore, there can be at most 6 frequent elements.
2. Exploiting the Sorted Structure
- Why Sorting Helps: In a sorted array, all occurrences of a given value appear in one contiguous block. This means if an element appears ân/6â times, that block of identical values must be long enough to âcrossâ certain key positions in the array.
3. Pinpointing Candidates via Sample Positions
- Sampling Idea: Divide the array into 6 (approximately) equal parts. If an element appears in at least ân/6âpositions, then its block of occurrences must overlap one of these 6 dividing points.
- Thus, you only need to look at the values at those dividing points (there are at most 6 such positions) as potential frequent elements.
4. Confirming Candidates with Binary Search
- Check Each Candidate: Once you have at most 6 candidate values, you can do two binary searches for each candidate:
- One to find where the candidate first appears.
- Another to find where the candidate last appears.
- The difference of those two indices (plus one) gives you the candidateâs total frequency. If it meets or exceeds ân/6â, it is truly frequent.
5. Time Complexity
- Sampling those 6 positions takes constant time.
- Checking each candidate with two binary searches takes O(logâĄn)time per candidate, for a total of O(logâĄn)across all candidates.
- Hence, the algorithm is both correct (it finds all frequent elements and only those) and efficient (runs in O(logâĄn)time).
Why This Works
- No Missed Elements: A frequent element must appear at one of the 6 sampled positions, so you do not miss any.
- No Extra Elements: You verify each candidate by counting, so only truly frequent ones are reported.
This approach leverages the interplay between the frequency threshold (ân/6â) and the sorted nature of the array. It systematically narrows down the possibilities to at most 6 checks, each confirmed via fast lookups (binary search).