Randomized QuickSort

Randomized QuickSort is an efficient sort algorithm requiring only $\Theta(n\log n)$ expected time on any input.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
function RANDOMIZED-QUICKSORT(Array, low, high)
if low < high then
partition := RANDOMIZED-PARTITION(Array, low, high)
RANDOMIZED-QUICKSORT(Array, low, partition - 1)
RANDOMIZED-QUICKSORT(Array, partition + 1, high)

function RANDOMIZED-PARTITION(Array, low, high)
pivotIndex := RANDOMIZED-CHOOSEPIVOT(Array, low, high)
pivotValue := Array[pivotIndex]
swap Array[high] and Array[pivotIndex]
storeIndex := low
for i from low to high do
if Array[i] < pivotValue then
swap Array[i] and Array[storeIndex]
storeIndex := storeIndex + 1
swap Array[high] and Array[storeIndex]
return storeIndex

Randomized Select

Randomized Select is an efficient algorithm to find $i$th smallest element of Array[low…high] with only linear expected time on any input.

1
2
3
4
5
6
7
8
9
10
11
function RANDOMIZED-SELECT(Array, low, high, i)
if low = high then
return Array[low]
partition := RANDOMIZED-PARTITION(Array, low, high)
rank := partition - low + 1
if i = rank then
return Array[partition]
else if i < rank then
return RANDOMIZED-SELECT(Array, low, partition - 1, i)
else
return RANDOMIZED-SELECT(Array, partition + 1, high, i - rank)

Analysis

Randomized QuickSort

Let $T(n)$ = the random variable for the running time of randomized QuickSort on an input of size $n$. Assuming those r.v. are independent.
For $k = 0, 1, 2, \cdots, n - 1$, define the indicator r.v.
\[
X_k =
\begin{cases}
1\text{ if RANDOMIZED-PARTITION generates a k:n - k - 1 split,}\newline
0~otherwise
\end{cases}
\]
Then we can simplify the $T(n)$ equations:
\[
\begin{align*}
T(n) &=
\begin{cases}
T(0) + T(n - 1) + \Theta(n)\text{ if 0:n - 1 split},\newline
T(1) + T(n - 2) + \Theta(n)\text{ if 1:n - 2 split},\newline
\cdots\newline
T(n - 1) + T(0) + \Theta(n)\text{ if n - 1:0 split}\newline
\end{cases}\newline
&= \sum_{k = 0}^{n - 1}X_k\left(T(k) + T(n - k - 1) + \Theta(n)\right)
\end{align*}
\]
Take expectations of both sides:
\[
\begin{align*}
E[T(n)] &= E\left[\sum_{k = 0}^{n - 1}X_k\left(T(k) + T(n - k - 1) + \Theta(n)\right)\right]\newline
&= \sum_{k = 0}^{n - 1}E\left[X_k\left(T(k) + T(n - k - 1) + \Theta(n)\right)\right]
\end{align*}
\]
Since independence of $X_k$ from other random choices,
\[
\begin{align*}
E[T(n)] &= \sum_{k = 0}^{n - 1}E\left[X_k\right]E\left[T(k) + T(n - k - 1) + \Theta(n)\right]\newline
&= \frac{1}{n}\sum_{k = 0}^{n - 1}E\left[T(k) + T(n - k - 1) + \Theta(n)\right]\newline
&= \frac{1}{n}\sum_{k = 0}^{n - 1}E\left[T(k)\right] + \frac{1}{n}\sum_{k = 0}^{n - 1}E\left[T(n - k - 1)\right] + \frac{1}{n}\sum_{k = 0}^{n - 1}E\left[\Theta(n)\right]\newline
&= \frac{2}{n}\sum_{k = 0}^{n - 1}E\left[T(k)\right] + \Theta(n)\newline
\end{align*}
\]
Absorb the $k = 0$ terms in $\Theta(n)$, so we get
\[E[T(n)] = \frac{2}{n}\sum_{k = 1}^{n - 1}E\left[T(k)\right] + \Theta(n)\]
Prove $E[T(n)] \leq an\log n$ for constant $a > 0$:

Substitue inductive hypothesis

\[
\begin{align*}
E[T(n)] &\leq \frac{2}{n}\sum_{k = 1}^{n - 1}ak\log k + \Theta(n)\newline
\sum_{k = 1}^{n - 1}ak\log k &< \int_{k = 1}^nak\log k \mathrm dk\newline
&= (\frac{1}{2}ak^2\log k - \frac{1}{4}ak^2)|_1^n\newline
&= \frac{1}{2}an^2\log n - \frac{1}{4}an^2
\end{align*}
\]
So that,
\[
\begin{align*}
E[T(n)] &\leq \frac{2}{n}\left(\frac{1}{2}an^2\log n - \frac{1}{4}an^2\right) + \Theta(n)\newline
&= an\log n - (\frac{1}{2}an - \Theta(n))\newline
\end{align*}
\]
if $a$ is chosen sufficient large so that $\frac{1}{2}an$ dominates the $\Theta(n)$,
\[
\begin{align*}
E[T(n)] &\leq an\log n
\end{align*}
\]
Thus, randomized QuickSort has a $\Theta(n\log n)$ expected sorting time.

Randomized Select Analysis

Let $T(n)$ = the random variable for the running time of randomized select on an input of size $n$. Assuming those r.v. are independent.
For $k = 0, 1, 2, \cdots, n - 1$, define the indicator r.v.
\[
X_k =
\begin{cases}
1\text{ if RANDOMIZED-PARTITION generates a k:n - k - 1 split,}\newline
0~otherwise
\end{cases}
\]
Then we can simplify the $T(n)$ equations:
\[
\begin{align*}
T(n) &=
\begin{cases}
T(max\{0, n - 1\}) + \Theta(n)\text{ if 0:n - 1 split},\newline
T(max\{1, n - 2\}) + \Theta(n)\text{ if 1:n - 2 split},\newline
\cdots\newline
T(max\{n - 1, 0\}) + \Theta(n)\text{ if n - 1:0 split}\newline
\end{cases}\newline
&= \sum_{k = 0}^{n - 1}X_k\left(T(max\{k, n - k - 1\}) + \Theta(n)\right)
\end{align*}
\]
Take expectations of both sides:
\[
\begin{align*}
E[T(n)] &= E\left[\sum_{k = 0}^{n - 1}X_k\left(T(max\{k, n - k - 1\}) + \Theta(n)\right)\right]\newline
&= \sum_{k = 0}^{n - 1}E\left[X_k\left(T(max\{k, n - k - 1\}) + \Theta(n)\right)\right]
\end{align*}
\]
Since independence of $X_k$ from other random choices,
\[
\begin{align*}
E[T(n)] &= \sum_{k = 0}^{n - 1}E\left[X_k\right]E\left[T(max\{k, n - k - 1\}) + \Theta(n)\right]\newline
&= \frac{1}{n}\sum_{k = 0}^{n - 1}E\left[T(max\{k, n - k - 1\}) + \Theta(n)\right]\newline
&= \frac{2}{n}\sum_{k = \lfloor\frac{n}{2}\rfloor}^{n - 1}E\left[T(k)\right] + \Theta(n)\newline
\end{align*}
\]
Prove $E[T(n)] \leq cn$ for constant $c > 0$:

Substitute inductive hypothesis

\[
\begin{align*}
E[T(n)] &\leq \frac{2}{n}\sum_{k = \lfloor\frac{n}{2}\rfloor}^{n - 1}ck + \Theta(n)\newline
&\leq \frac{2}{n}(\frac{3}{8}cn^2) + \Theta(n)\newline
&= cn - (\frac{1}{4}cn - \Theta(n))
\end{align*}
\]
if $c$ is chosen sufficient large so that $\frac{1}{4}cn$ dominates the $\Theta(n)$,
\[
\begin{align*}
E[T(n)] &\leq cn
\end{align*}
\]
Thus, randomized select has a linear expected time.