Adaptive Sort


When sorting algorithms are discussed, it's almost always very theoretical about Θ(nlogn) efficiency on random data with no duplicate elements, but sorting in practice is often very different. In fact, I would guess that most real life sorting are on data that is either to some extent ordered, or has lots of duplicate values.

In other words, an efficient sorting algorithm has to handle the following cases:

  1. Sorting random data with few or no duplicate values.
  2. Sorting data with lots of duplicate values, for example if you sort all people in USA after the state they live in, of sort people after their sex or marriage status, and so on.
  3. Sorting almost sorted lists, a typical scenario is a user who modifies an already sorted list and then need to resort it.
  4. Sorting already sorted list or reversely sorted lists, which often happens due to different layers of software/API (one layer might not know that its already sorted).
  5. Sorting a list that is the result of two or more sorted lists appended to each other.
  6. A combination of two or more of the above points.

Quicksort and Merge sort handles point 1 well, and to some extent one or two more points, but can not be said to handle all the above cases very well, so while these algorithms are fairly efficient, they are not optimal for all cases (see the performance page for more info).

My algorithm, named ASort, focuses on practical sorting, and the focus I have had while developing this algorithm has been:

To try to make ASort as fast as possible on as many sorting scenarios as possible.

The performance of ASort can be viewed on the performance page, and its very similar in efficiency to Quicksort and Merge sort on random data with few or no duplicate elements. But once the data is either ordered or semi-ordered, or contains lots of duplicate items, or a combination of those, ASort shows it true strength, yielding a performance gain up to almost 1000% in some cases.

Obviously this performance can not be gained by simply inserting a preprocess stage in the sorting algorithm. The adaption to the data takes place through out the algorithm, and have very little impact on performance for random data (unlike for example Sedgewick Quicksort). This means that random data mixed with ordered data also yields good performance.

Finally, unlike Merge sort, ASort has Θ(1) in heap memory efficiency, and Θ(logn) in stack memory efficiency. It does use a fixed size buffer (currently set to 4Mb), but on the other hand it uses less stack data than Quicksort so for large lists it requires less memory in total compared to Quicksort.