Loading presentation...

Selection Problems and
Adversary Arguments

Rigorous Mathematical Analysis and Lower Bounds

TC3006 Analysis and Design of Algorithms
Salvador Hinojosa
salvador.hinojosa@tec.mx
Session 06 | March 24, 2025
Office hours: salvadorhinojosa.youcanbook.me

Based on course materials by Dr. Hugo Terashima Marín

Course Outline

Part I: Fundamentals

  • Selection Problems Overview
  • Binary Search Analysis
  • Finding Maximum and Minimum
  • Optimal Comparison Bounds

Part II: Tournament Method

  • Finding Second Largest Element
  • Tournament Tree Structure
  • Complexity Analysis
  • Formal Proofs of Optimality

Part III: Lower Bounds

  • Adversary Arguments
  • Information-Theoretic Analysis
  • State Transition Systems
  • Median-of-Medians Algorithm

Learning Objectives

By the end of this session, you will be able to:

1. Algorithmic Understanding

  • Analyze selection problems and derive complexity bounds
  • Apply tournament method for finding second largest element
  • Implement the median-of-medians algorithm for linear-time selection

2. Proof Techniques

  • Construct and apply adversary arguments to prove lower bounds
  • Use information-theoretic analysis to determine minimum comparisons
  • Prove optimality of algorithms through matching upper and lower bounds

3. Research Skills

  • Critically evaluate algorithm efficiency and optimality claims
  • Formalize intuitive arguments using state transition systems
  • Apply techniques to novel algorithmic problems in your research

Part I

Selection Problems

Fundamentals and Binary Search

Binary Search

Problem: Given a sorted array of n elements, find element x.

Key Insight

Since the array is sorted, we can eliminate half the elements with each comparison.

Algorithm:

  • Compare x with the middle element
  • If x equals middle: found!
  • If x < middle: search left half
  • If x > middle: search right half
  • Repeat until found or search space is empty

Binary Search: Visual Demonstration

Searching for 47 in a sorted array

3 0 7 1 12 2 18 3 23 4 29 5 34 6 41 7 47 8 53 9 61 10 68 11 75 12 82 13 91 14 L M R Target: 47 Comparisons: 0

⏸️ Pause and Predict

Critical Thinking: Complexity Analysis

Question: What is the worst-case time complexity of binary search on a sorted array of n elements?

Think about:

  • How much of the array is eliminated with each comparison?
  • How many times can you divide n by 2 before reaching 1?
  • What does this tell you about the growth rate?

Answer:

  • Each comparison eliminates half the remaining elements
  • Maximum comparisons = number of times we can halve n
  • This is exactly \(\lceil \log_2 n \rceil\)
  • Complexity: \(O(\log n)\)

Meta-comment: This logarithmic complexity makes binary search extremely efficient—searching 1 billion elements requires only ~30 comparisons!

Binary Search: Worst Case Analysis

Question: What is the worst case of binary search?

Recurrence Relation:

\(T(n) = T(\lfloor n/2 \rfloor) + 1\)

Solution:

T(n) = \(O(\log n)\)

Each recursive call halves the problem size, giving us logarithmic complexity.

Binary Search: Average Case

Question: What is the average case of binary search?

Assumptions:

  • \(n = 2^k - 1\) (complete binary tree)
  • There are n + 1 gaps between/around the elements
  • Probability is uniform: P(each position) = \(\frac{1}{2n+1}\)

Binary Search Tree Representation:

1 2 4 8 1 2 2 3 3 3 3 4 4 4 4 4 4 4 4 1 comp for 1 2×2 for 2 & 9 3×2² for 3,6,10,13 4×2³ for 4,5,7,8...

\(n = 2^k - 1\)

Total comparisons: \(\sum_{i=1}^{k} i \cdot 2^{i-1}\)

Binary Search: Average Case Derivation

Average complexity formula:

\[ A(n) = \frac{1}{2n+1} \left[ \sum_{i=1}^{k} i \cdot 2^{i-1} + k(n+1) \right] \]

if x is not in the array

Step 1: Simplify the summation

\[ \sum_{i=1}^{k} i \cdot 2^{i-1} = \frac{1}{2} \sum_{i=1}^{k} i \cdot 2^{i} = (k-1)2^k + 1 \]

Binary Search: Average Case (continued)

Step 2: Substitute \(n = 2^k - 1\)

\[ A(n) = \frac{(k-1)2^k + 1}{2n+1} + \frac{k(n+1)}{2n+1} \]
\[ = \frac{(k-1)2^k + 1}{2(2^k-1)+1} + \frac{k(2^k-1+1)}{2(2^k-1)+1} \]
\[ = \frac{(k-1)2^k + 1}{2^{k+1}-1} + \frac{k \cdot 2^k}{2^{k+1}-1} \]
\[ = \frac{k2^k - 2^k + 1 + k2^k}{2^{k+1}-1} \]

Binary Search: Average Case (conclusion)

Step 3: Final simplification

\[ A(n) = \frac{2k \cdot 2^k - 2^k + 1}{2^{k+1}-1} \]
\[ \approx \frac{2k \cdot 2^k}{2^{k+1}} = k - \frac{1}{2} \]
\[ = \lfloor \lg n \rfloor + \frac{1}{2} \quad \Rightarrow \quad \boxed{O(\log n)} \]

⭐ Key Result:

Binary search has \(O(\log n)\) complexity in both worst and average cases.

This makes it one of the most efficient search algorithms for sorted data!

What is Selection?

Definition: Given n elements, find the kth order statistic.

The kth order statistic is the element that would be at position k if the array were sorted.

Examples:

  • k = 1: minimum
  • k = n: maximum
  • k = \(\lceil n/2 \rceil\): median
  • k = 10: 10th smallest element

Why Not Just Sort?

Sorting

Tsort(n) = O(n log n)

Examples:

  • Mergesort: ~n log n comparisons
  • Heapsort: ~n log n comparisons
  • Quicksort: ~n log n average

Selection

Tselect(n) = O(n)

We can do better!

  • Don't need full order
  • Only care about one element
  • Linear time possible!

Concrete Example

Problem: Find the 5th smallest element

Array: [23, 17, 31, 12, 8, 19, 25, 14, 29, 11]

Approach 1 - Sorting:

Sort → [8, 11, 12, 14, 17, 19, 23, 25, 29, 31]

Answer: 17 (at index 4)

Cost: ~34 comparisons (n log n ≈ 10 × 3.32)

Approach 2 - Selection:

Use selection algorithm → Answer: 17

Cost: ~17 comparisons (linear in n)

50% fewer comparisons!

Finding Maximum

Mathematical Analysis

Finding Maximum: The Algorithm

function findMax(A[1..n]):
    max = A[1]
    for i = 2 to n:
        if A[i] > max:
            max = A[i]
    return max

How many comparisons?

Exactly n - 1 comparisons (one per iteration from 2 to n)

Why n-1 is Optimal: Proof

Lower Bound Proof

Claim: Any comparison-based algorithm must make at least n-1 comparisons.

Proof:

  1. For an element to be the maximum, all other elements must be proven smaller
  2. An element is "proven smaller" only if it loses at least one comparison
  3. We have n-1 elements that are not the maximum
  4. Each of these n-1 elements must lose ≥ 1 comparison
  5. Therefore, we need at least n-1 comparisons ∎

Our algorithm uses exactly n-1 comparisons → OPTIMAL!

Example: Finding Maximum

Array: [12, 7, 23, 5, 18]

Step Compare Result Current Max Comparisons
1 7 vs 12 7 < 12 12 1
2 23 vs 12 23 > 12 23 2
3 5 vs 23 5 < 23 23 3
4 18 vs 23 18 < 23 23 4

Result: Maximum = 23, Comparisons = 4 = n-1 ✓

Finding Both
Max & Min

Optimizing the Algorithm

Naive Approach

1 Find maximum using n-1 comparisons

2 Find minimum from remaining n-1 elements

This takes n-2 comparisons

Total Comparisons:

Tnaive(n) = (n - 1) + (n - 2) = 2n - 3

Optimized Approach: Process in Pairs

Key Insight: The maximum can never lose a comparison, and the minimum can never win.

So we can separate elements into two groups!

Algorithm:

1 Compare elements in pairs → separate into winners and losers

2 Find max among winners

3 Find min among losers

Detailed Example: Pairing Method

Array: [23, 17, 31, 12, 8, 19, 25, 14] (n = 8)

Step 1: Pair Comparisons

Pair Comparison Winner Loser
123 vs 172317
231 vs 123112
38 vs 19198
425 vs 142514

Comparisons so far: 4 = n/2

Example (continued)

Step 2: Find Max Among Winners

Winners: [23, 31, 19, 25]

Comparisons: 23 vs 31 → 31, then 19 vs 25 → 25, then 31 vs 25 → 31

Max = 31, Comparisons: 3 = n/2 - 1

Step 3: Find Min Among Losers

Losers: [17, 12, 8, 14]

Comparisons: 17 vs 12 → 12, then 8 vs 14 → 8, then 12 vs 8 → 8

Min = 8, Comparisons: 3 = n/2 - 1

Total:

4 + 3 + 3 = 10 comparisons

General Formula Derivation

For even n:

Pair comparisons = n/2

Max among winners = (n/2) - 1

Min among losers = (n/2) - 1

Total:

T(n) = n/2 + (n/2 - 1) + (n/2 - 1)
= n/2 + n/2 + n/2 - 2
= 3n/2 - 2

⏸️ Pause and Predict

Quick Check: Apply the Formula

Question: Using the pairing method we just derived, how many comparisons are needed to find both max and min for n = 10 elements?

Think about:

  • Is n = 10 even or odd?
  • What formula should you use?
  • Calculate step by step

✓ Answer:

n = 10 is even, so T(10) = 3(10)/2 - 2 = 15 - 2 = 13 comparisons

Breakdown: 5 pairs + 4 for max + 4 for min = 13

This formative assessment helps verify your understanding before moving forward

Formula for Odd n

For odd n:

Handle first element separately (acts as both initial max and min)

Remaining n-1 elements (even) use pairing method

Pair comparisons = (n-1)/2

Max among winners = (n-1)/2 (including initial element)

Min among losers = (n-1)/2 (including initial element)

Total:

T(n) = (n-1)/2 + (n-1)/2 + (n-1)/2
= 3(n-1)/2 = \(\lceil 3n/2 \rceil\) - 2

Comparison: Naive vs Optimized

n Naive (2n-3) Optimized (\(\lceil 3n/2 \rceil\)-2) Savings % Improvement
101713423.5%
203728924.3%
5097732424.7%
1001971484924.9%
10001997149849925.0%

As n grows, we save approximately 25% comparisons!

Lower Bound Proof: \(\lceil 3n/2 \rceil\) - 2 is Optimal

Theorem: Any comparison-based algorithm must make at least \(\lceil 3n/2 \rceil\) - 2 comparisons.

Proof (via adversary argument):

  1. To identify max: need n-1 elements to have lost ≥ 1 comparison each
  2. To identify min: need n-1 elements to have won ≥ 1 comparison each
  3. Overlap: n-2 elements must be in both sets (both won AND lost)
  4. Minimum "moves" needed: 2(n-2) + 2 = 2n-2
  5. Best case: comparing two "N" elements gives 2 moves per comparison
  6. Therefore: \(\lfloor n/2 \rfloor\) comparisons give \(2\lfloor n/2 \rfloor\) moves
  7. Remaining moves need individual comparisons: (2n-2) - \(2\lfloor n/2 \rfloor\)
  8. Total = ⌊n/2⌋ + (2n-2-\(2\lfloor n/2 \rfloor\)) = \(\lceil 3n/2 \rceil\) - 2 ∎

Finding the
Second Largest

Tournament Method Analysis

Why Tournament Method?

Key Theorem: The second largest element MUST have lost directly to the largest element.

Proof by Contradiction:

  1. Assume the 2nd largest element never compared with the largest
  2. Or, it compared but lost to some element X ≠ largest
  3. Then X > 2nd largest
  4. But largest > X > 2nd largest
  5. This means X is at least the 2nd largest, contradicting our assumption
  6. Therefore, 2nd largest MUST have lost directly to largest ∎

Tournament Tree Properties

Definition: A tournament tree is a complete binary tree where:

  • Leaves are the input elements
  • Internal nodes represent comparison winners
  • Root is the maximum element

Key Properties:

  • Height: h = \(\lceil \log_2 n \rceil\)
  • Total comparisons to build: n - 1
  • Elements compared with root: ≤ h = \(\lceil \log_2 n \rceil\)
  • Path length from leaf to root: ≤ h

Mathematical Analysis

Finding 2nd Largest - Step by Step:

1 Build tournament tree

Comparisons = n - 1

2 Count elements that lost to max

Max traveled from leaf to root = \(\lceil \log_2 n \rceil\) levels
Elements that lost to max = \(\lceil \log_2 n \rceil\)

3 Find max among those

Comparisons = \(\lceil \log_2 n \rceil\) - 1

Total Comparisons Formula

T(n) = Tbuild tournament + Tfind among losers

T(n) = (n - 1) + (\(\lceil \log_2 n \rceil\) - 1)
= n + \(\lceil \log_2 n \rceil\) - 2

Examples:

n \(\lceil \log_2 n \rceil\) n + \(\lceil \log_2 n \rceil\) - 2 Naive (2n-3) Improvement
8391331%
164182938%
6466812546%
1024101032204550%

Detailed Example with n=8

Array: [12, 7, 23, 5, 18, 9, 15, 11]

Tournament Rounds:

Round 1 (4 comparisons):

12 vs 7 → 12 | 23 vs 5 → 23 | 18 vs 9 → 18 | 15 vs 11 → 15

Round 2 (2 comparisons):

12 vs 23 → 23 | 18 vs 15 → 18

Round 3 (1 comparison):

23 vs 18 → 23 (MAX)

Total comparisons: 4 + 2 + 1 = 7 = n-1 ✓

Tournament Tree Visualization ⚡ Animated

12 7 23 5 18 9 15 11 12 R1 23 R1 18 R1 15 R1 23 R2 18 R2 23 MAX Losers to MAX: {5, 12, 18} 2nd Largest must be among these!

Path from 23 to root: 3 levels → \(\lceil \log_2 8 \rceil\) = 3 comparisons → 3 losers

Example (continued)

Elements that lost to MAX (23):

  • Round 1: 5 lost to 23
  • Round 2: 12 lost to 23
  • Round 3: 18 lost to 23

Losers to max: {5, 12, 18} (3 elements = \(\lceil \log_2 8 \rceil\))

Find max among {5, 12, 18}:

12 vs 18 → 18 (1 comparison)

5 vs 18 → 18 (1 comparison)

2nd Largest = 18

Total: 7 + 2 = 9 = 8 + 3 - 2 = n + \(\lceil \log_2 n \rceil\) - 2 ✓

Why \(\lceil \log_2 n \rceil\) Elements?

Proof: Why does the maximum compare against exactly \(\lceil \log_2 n \rceil\) elements?

  1. In a tournament tree, each element starts at a leaf
  2. To reach the root (height h), an element must win h comparisons
  3. For n elements, tree height = \(\lceil \log_2 n \rceil\)
  4. At each level, the maximum compares against exactly 1 element
  5. Maximum travels \(\lceil \log_2 n \rceil\) levels → makes \(\lceil \log_2 n \rceil\) comparisons
  6. Therefore, exactly \(\lceil \log_2 n \rceil\) elements lost to the maximum ∎

⏸️ Pause and Predict

Comprehension Check: Tournament Tree

Question: For a tournament with n = 16 elements, how many elements lose directly to the maximum?

Hint:

  • What is the height of the tournament tree for 16 elements?
  • How many rounds does the maximum win?
  • How many elements does it compare against per round?

✓ Answer:

\(\lceil \log_2 16 \rceil\) = 4 elements lose directly to the maximum

  • Tree height = 4 (since 2⁴ = 16)
  • Maximum wins 4 comparisons (one per level)
  • Therefore, 4 elements lost to max
  • Second largest must be among these 4 elements!

Understanding this relationship is key to proving optimality

Adversary Arguments

Proving Lower Bounds Through Adversarial Analysis

Information-Theoretic Approach

What is an Adversary Argument?

Definition: A proof technique that shows a lower bound by constructing an opponent (adversary) that forces any algorithm to do at least k operations.

The Game:

  • Algorithm: Asks "Is A[i] < A[j]?"
  • Adversary: Answers "yes" or "no" trying to maximize work
  • Constraint: Adversary must remain consistent (no contradictions)
  • Goal: If adversary can force k comparisons on ANY algorithm, then k is a lower bound

Key Concept: Information Units

Definition: An information unit is a fundamental piece of knowledge gained from a comparison that helps determine the solution.

Formal Definition:

For finding max and min, each comparison yields information about which elements can/cannot be max or min:

  • 1 information unit = Learning that one element cannot be max OR cannot be min
  • 2 information units = Learning both facts about different elements

Example:

Compare A[i] vs A[j], assume A[i] < A[j]:

  • Information gained: A[i] cannot be max, A[j] cannot be min
  • Information units: 2 (one fact about each element)

Why This Matters:

We need 2n - 2 information units to determine both max and min (eliminate n-1 candidates for max, n-1 candidates for min). The adversary strategy ensures we gain the minimum information per comparison.

Formal Lemma: Comparison Efficiency

Lemma: In the adversary game for max/min, each comparison yields at most 2 state transitions (moves).

Comparison Type State Changes Information Units
(N, N) N→W, N→L 2 moves (maximum)
(W, W) Loser: W→WL 1 move
(L, L) Winner: L→WL 1 move
(W, N) N→L (adversary makes W win) 1 move
(L, N) N→W (adversary makes L lose) 1 move
(W, L) None (outcome already determined) 0 moves
(WL, *) None (WL is terminal state) 0 moves

Proof of Lemma

Proof by case analysis on state transitions:

State Transition Constraints:

  • N (never compared) can transition to: W, L, or WL
  • W (won ≥1, lost 0) can only transition to: WL
    • Cannot → L (contradicts "lost 0")
    • Cannot remain W if it loses a comparison
  • L (won 0, lost ≥1) can only transition to: WL
    • Cannot → W (contradicts "won 0")
    • Cannot remain L if it wins a comparison
  • WL (won ≥1, lost ≥1) cannot transition (already eliminated as max/min candidate)

Maximum: 2 state transitions per comparison, achieved only when comparing two N elements (one becomes W, one becomes L). ∎

From Lemma to Lower Bound

Applying the Lemma:

Known: Total moves needed = 2n - 2

Proven: Moves per comparison ≤ 2

Therefore:

Number of comparisons ≥ (2n - 2) / 2
= n - 1

✓ Lower Bound Proven: Any algorithm finding both max and min requires at least n - 1 comparisons.

Our pairing method achieves \(\lceil 3n/2 \rceil\) - 2 ≈ 1.5n - 2 comparisons.

Gap factor: ~1.5× (optimal within a constant factor!)

State System for Max/Min

Four States Track What We Know:

State Meaning Could be Max? Could be Min?
N Never compared Yes Yes
W Won ≥1, Lost 0 Yes No
L Won 0, Lost ≥1 No Yes
WL Won ≥1, Lost ≥1 No No

Stopping Condition

Algorithm can stop when:

  • Exactly 1 element in state W (the maximum)
  • Exactly 1 element in state L (the minimum)
  • Exactly n-2 elements in state WL (neither max nor min)

State Equation:

|W| + |L| + |WL| + |N| = n
Stopping: |W|=1, |L|=1, |WL|=n-2, |N|=0

⏸️ Pause and Predict

Critical Thinking: Information Gained

Question: Which comparison gives the adversary MOST control (LEAST information)?

Options:

  • A. Two N elements (N vs N)
  • B. Two W elements (W vs W)
  • C. W with L element (W vs L)
  • D. All give equal information

Think about:

  • How many elements change state in each type?
  • What new information is gained about max/min?
  • Which comparison lets adversary "stall"?

✓ Answer: C (W vs L)

  • N vs N: Both change → 2 units (N→W and N→L)
  • W vs W: Only loser changes (W→WL) → 1 unit
  • L vs L: Only winner changes (L→WL) → 1 unit
  • W vs L: Already determined → 0 units (LEAST!)

Key: W vs L reveals no new information since the outcome is predetermined!

This understanding is crucial for proving the 2n-2 lower bound

State Transition Diagram

N Never Compared Can be Max or Min W Won ≥1, Lost 0 Can be Max L Won 0, Lost ≥1 Can be Min WL Won ≥1, Lost ≥1 Neither Max nor Min Win Lose Lose Win Compare N vs N → 2 moves! Goal: |W|=1, |L|=1 |WL|=n-2, |N|=0

Adversary Strategy Table

Comparison Adversary Response New States Info Units Explanation
N, N x > y W, L 2 Most efficient: 2 state changes
W, W x > y W, WL 1 One loses "max candidacy"
L, L x > y WL, L 1 One loses "min candidacy"
W, N x > y W, L 1 N element gets classified
L, N x < y L, W 1 N element gets classified
W, L x > y (forced) W, L 0 Outcome already known!
WL, W Consistent WL, W 0 No new information

Info Unit: A state transition (N→W, N→L, W→WL, or L→WL)

Counting Required Moves

How many "moves" to reach stopping condition?

Initial state: All n elements in state N

Final state: 1 in W, 1 in L, n-2 in WL

Moves needed:

For n-2 elements to reach WL: each needs 2 moves
→ 2(n-2) moves
For 1 element to reach W: needs 1 move (N→W)
For 1 element to reach L: needs 1 move (N→L)
Total moves required: 2(n-2) + 1 + 1 = 2n - 2

From Moves to Comparisons

Optimal Comparison Strategy:

Best case: N vs N comparisons give 2 moves each

Other comparisons give ≤ 1 move each

Minimizing comparisons:

Do \(\lfloor n/2 \rfloor\) comparisons of type (N,N)
→ Get \(2\lfloor n/2 \rfloor\) moves
Remaining moves needed: (2n-2) - \(2\lfloor n/2 \rfloor\)
These need individual comparisons (1 move each)
Total comparisons: ⌊n/2⌋ + [(2n-2) - \(2\lfloor n/2 \rfloor\)]

Final Formula Derivation

Case 1: n even

T(n) = n/2 + (2n - 2 - 2(n/2))
= n/2 + (2n - 2 - n)
= n/2 + n - 2
= 3n/2 - 2

Case 2: n odd

T(n) = (n-1)/2 + (2n - 2 - 2((n-1)/2))
= (n-1)/2 + (2n - 2 - (n-1))
= (n-1)/2 + n - 1
= (3n-3)/2 = \(\lceil 3n/2 \rceil\) - 2

Adversary Animation: Live ⚡ Animated

Watch states change as comparisons happen (n=6)

e₀ N e₁ N e₂ N e₃ N e₄ N e₅ N Comparing... State Counts W (Max candidates) 0 L (Min candidates) 0 WL (Neither) 0 N (Uncompared) 6

Visual Adversary Example: Step by Step

Tracking n=6 elements through adversary strategy

Initial: e₀ N e₁ N e₂ N e₃ N e₄ N e₅ N Count: W:0 L:0 WL:0 N:6 Step 1: e₀ > e₁ e₀ W e₁ L e₂ N e₃ N e₄ N e₅ N Count: W:1 L:1 WL:0 N:4 +2 moves! Step 3: (after pairs) e₀ W e₁ L e₂ W e₃ L e₄ W e₅ L Count: W:3 L:3 WL:0 N:0 6 moves total Final: (7 comparisons) e₀ e₁ e₂ e₃ e₄ e₅ Final Count: W:1 L:1 WL:4 N:0 ✓ MAX MIN

Adversary Example: n=6

Elements: [e₀, e₁, e₂, e₃, e₄, e₅]

Initial: All in state N

Step Compare Result States After W|L|WL|N Moves
1 e₀ vs e₁ e₀ > e₁ W,L,N,N,N,N 1|1|0|4 2
2 e₂ vs e₃ e₂ > e₃ W,L,W,L,N,N 2|2|0|2 2
3 e₄ vs e₅ e₄ > e₅ W,L,W,L,W,L 3|3|0|0 2
4 e₀ vs e₂ e₀ > e₂ W,L,WL,L,W,L 2|3|1|0 1
5 e₀ vs e₄ e₀ > e₄ W,L,WL,L,WL,L 1|3|2|0 1
6 e₃ vs e₅ e₅ > e₃ W,L,WL,WL,WL,L 1|2|3|0 1
7 e₁ vs e₅ e₁ > e₅ W,WL,WL,WL,WL,L 1|1|4|0 1

Total: 7 comparisons, 10 moves = 2(6)-2 ✓

Formula: \(\lceil 3(6)/2 \rceil\) - 2 = 9 - 2 = 7 ✓

Median-of-Medians Algorithm

Deterministic Linear-Time Selection

Optimal Pivot Selection and Recurrence Analysis

The Problem: Find kth Element

Goal: Find the kth smallest element in O(n) worst-case time

Why is this hard?

  • Quickselect: O(n) average, but O(n²) worst case
  • Problem: Bad pivot choice leads to unbalanced partitions
  • Solution: Guarantee a "good" pivot!

Algorithm Overview

1 Divide n elements into groups of 5

2 Find median of each group (6 comparisons per group)

3 Recursively find median of the medians → pivot X

4 Partition array around X into L and R

5 Recursively search in L or R based on X's position

Why Groups of 5? The Guarantee

Pivot Quality: How Many Elements ≤ X?

Step 1: Count group medians ≤ X

m = \(\lceil n/5 \rceil\) groups total
X is the median of m medians
→ At least \(\lfloor m/2 \rfloor\) group medians are ≤ X

Pivot Quality (continued)

Step 2: Count elements in those groups

For each group whose median ≤ X:

• The median itself ≤ X (1 element)
• 2 elements below the median (also ≤ X)
= 3 elements ≤ X per group

Step 3: Total elements ≤ X

≥ 3\(\lfloor m/2 \rfloor\) (3 elements from each of \(\lfloor m/2 \rfloor\) groups)
= 3⌊\(\lceil n/5 \rceil\)/2⌋
≥ \(3\lfloor (n/5)/2 \rfloor\) = \(3\lfloor n/10 \rfloor\)
≥ 3(n/10 - 1) = 3n/10 - 3

Pivot Quality: Final Bound

Result: The median-of-medians X satisfies:

X > at least 3n/10 - 6 elements

By symmetry (same argument for elements ≥ X):

X < at most 7n/10 + 6 elements

Key Insight: Both partitions have size ≤ 7n/10 + 6

This guarantees the recursion decreases by at least 30% each time!

T(n) ≤ T(n/5) + T(7n/10) + O(n)

Since n/5 + 7n/10 = 9n/10 < n, this recurrence solves to T(n) = O(n)

Median-of-Medians Visual

Groups of 5 with Medians Group 1 8 3 12 20 15 median: 12 Group 2 4 6 9 11 14 median: 9 Group 3 2 5 7 10 13 median: 7 Group 4 1 16 18 19 25 median: 18 Group 5 17 21 22 23 24 median: 22 Medians Array: [12, 9, 7, 18, 22] 12 9 7 18 22 Pivot X = 12 (median of medians) Elements ≤ X (guaranteed): • At least 2 groups have median ≤ 12 • Each group: 3 elements ≤ its median ≥ 2 × 3 = 6 elements ≤ X Elements ≥ X (guaranteed): • At least 2 groups have median ≥ 12 • Each group: 3 elements ≥ its median ≥ 2 × 3 = 6 elements ≥ X

Result: At least 30% of elements are on each side → Guaranteed partition quality!

Worst-Case Partition Size

Elements guaranteed ≤ X: ≥ 3n/10 - 6

Elements guaranteed ≥ X: ≥ 3n/10 - 6 (by symmetry)

Worst case partition size:

Elements > X (in worst case) ≤ n - (3n/10 - 6)
= n - 3n/10 + 6
= 7n/10 + 6

Similarly for elements < X

This guarantees at least 30% reduction each recursion!

Time Complexity Recurrence

Breakdown of Operations:

m = \(\lceil n/5 \rceil\) groups

Finding medians: 6m comparisons (6 per group of 5)

Partitioning: ≤ 2(m-1) comparisons (optimized)

Finding median of medians: T(m)

Recursive call on partition: T\((7n/10 + 6)\)

Recurrence relation:

T(n) ≤ 6m + 2(m-1) + T(m) + T\((7n/10 + 6)\)
T(n) ≤ 8m - 2 + T(\(\lceil n/5 \rceil\)) + T\((7n/10 + 6)\)

Solving the Recurrence

Simplified (ignoring floors/ceilings):

T(n) ≤ 8(n/5) + T(n/5) + T(7n/10)
= 1.6n + T(n/5) + T(7n/10)

Key observation:

Recursive call sizes: n/5 + 7n/10 = 2n/10 + 7n/10 = 9n/10
Since 9n/10 < n, recursion converges!

Guess T(n) ≤ cn for some constant c:

T(n) ≤ 1.6n + c(n/5) + c(7n/10)
= 1.6n + 0.2cn + 0.7cn
= 1.6n + 0.9cn

Finding the Constant

We need: 1.6n + 0.9cn ≤ cn

1.6n ≤ cn - 0.9cn
1.6n ≤ 0.1cn
1.6 ≤ 0.1c
c ≥ 16

Result:

T(n) = O(n) with constant ≈ 16-17

This means approximately 16-17n comparisons in worst case!

Why Not Groups of 3 or 7?

Group Size Guarantee Recursive Sizes Sum Result
3 ≥ 25% n/3 + 3n/4 13n/12 > n O(n log n) ❌
5 ≥ 30% n/5 + 7n/10 9n/10 < n O(n)
7 ≥ 37% n/7 + 5n/7 6n/7 < n O(n) ✓ (slower)
9 ≥ 44% n/9 + 5n/9 6n/9 < n O(n) ✓ (much slower)

Why 5 is optimal: Smallest group size that guarantees linear time with reasonable constant!

Numerical Example: Small Scale

Array (n=15): [3, 7, 2, 9, 1, 8, 5, 6, 4, 15, 11, 13, 10, 14, 12]

Find: 8th smallest

Step 1: Divide into groups of 5

G1: [3,7,2,9,1] → sorted: [1,2,3,7,9] → median: 3

G2: [8,5,6,4,15] → sorted: [4,5,6,8,15] → median: 6

G3: [11,13,10,14,12] → sorted: [10,11,12,13,14] → median: 12

Medians: [3, 6, 12]

Step 2: Find median of medians

Pivot X = 6 (median of [3, 6, 12])

Example (continued)

Step 3: Partition around X=6

L (< 6): [3, 2, 1, 5, 4] → 5 elements

X: [6] → position 6

R (> 6): [7, 9, 8, 15, 11, 13, 10, 14, 12] → 9 elements

Step 4: Determine which partition to search

Looking for 8th smallest, X is at position 6

8 > 6, so search for (8-6)=2nd smallest in R

Step 5: Recurse on R

Find 2nd smallest in [7, 9, 8, 15, 11, 13, 10, 14, 12]

Answer: 8

📊 Summary Table

Comparison Complexity Summary

Problem Comparisons Type Proven Optimal?
Find maximum n - 1 Exact ✅ Yes
Find minimum n - 1 Exact ✅ Yes
Find both max & min \(\lceil 3n/2 \rceil\) - 2 Exact ✅ Yes
Find 2nd largest n + \(\lceil \log_2 n \rceil\) - 2 Exact ✅ Yes
Find kth element O(n) Asymptotic ✅ Yes (time)
Find median ≥ 3(n-1)/2 Lower bound ❓ Gap exists

Open Problem: For median, best known algorithm uses ~17n comparisons, but lower bound is only 1.5n. Can we close this gap?

🎓 Key Takeaways

Mathematics

Precise analysis reveals optimal algorithms

Adversaries

Prove fundamental limits

Clever Design

Can achieve optimal bounds

Course Assignments and Resources

Homework Assignment

Assignment: HH-HWork-06

Due Date: March 31, 2025

Topics Covered: Selection algorithms, adversary arguments, lower bound proofs, median-of-medians analysis

⚠️ Important Note:

Median-of-medians algorithm was not covered in lecture due to time constraints.

Required Reading: Sara Baase Chapter 5, Section 5.5 before starting the assignment.

Office hours available for questions: salvadorhinojosa.youcanbook.me

Office Hours: Available for questions on all topics including median-of-medians

Contact: salvador.hinojosa@tec.mx

Questions and Discussion

Salvador Hinojosa
salvador.hinojosa@tec.mx

CS-4012: Advanced Algorithm Analysis
Session 06 | March 24, 2025

Course materials based on the work of Dr. Hugo Terashima Marín

Animation Controls
Step 0 / 0