- Often, a sort is not required; what you really want in most large-scale cases are the top k results for some relatively small k.
- Real-world data has patterns that general algorithms like your standard library quicksort fail to exploit.
- Comparisons and swaps aren't always cheap.
As a simple example, suppose you have a billion integers to sort, but you know that each one is between 1 and 100. With that restriction alone, you can easily sort them in O(n) time. Other examples:
- The data are k-sorted (i.e. A[i] >= A[i-k].)
- The data aren't quite sorted, but they are unique; furthermore, abs(A[i] - A[i-1]) < k.
A while ago, I wrote a DHTML table sorter (for fun, naturally!) For reference, here's a standard quicksort implementation. Ignoring the horrendous colour scheme for a moment, why is the first one so much faster? Simple - it avoids swapping DOM elements. (There's a reason it's called permutation quicksort!)
Fine. So we can guarantee O(n) swaps in exchange for linear memory overhead. How do you limit comparisons? Remember that second point - notice patterns.
No comments:
Post a Comment