Thursday, January 24, 2013

Thoughts on Sorting

I compared the basic in-memory sorting algorithms against the standard sorting functions in C/C++ (qsort/sort) on my x86 machine today. Although the results were preliminary, it did confirm some ideas in sorting techniques:

       Sort Type    Sorted       Nearly Sorted              Random            Reversed           All Equal
       =========    ------       -------------              ------            --------           ---------
  Selection Sort    29.1090             29.2660             33.7420             28.9850             29.0480
  Insertion Sort    0.0000              0.0150             21.3410              0.0000              0.0000
      Merge Sort    0.0310              0.0160              0.0310              0.0310              0.0310
        Qck Sort    57.6120              1.5760              0.0460             57.7990             18.6420
   Rand Qck Sort    0.0150              0.0160              0.0150              0.0310             18.5490
  R.Qck+Ins Sort    0.0000              0.0160              0.0310              0.0000              0.0150
     C lib qsort    0.0000              0.0160              0.0160              0.0000              0.0000
    C++ lib sort    0.0000              0.0000              0.0150              0.0000              0.0160



  • 100K integers are used to perform the test and the CPU time is recorded
  • The 1st column lists the sorting methods.
  • The 1st row is the arrangement of integers in the test array: already sorted, almost sorted (99.99% sorted), unsorted (random order), reversely sorted and all equal.

The comparison shows that:
  1. The qsort/sort in standard library outperforms other user defined sorting functions in many cases. So use them when in doubt;
  2. Insertion sort works fine if the input data is sorted or "mostly" sorted;
  3. Merge sort performs nearly same under all cases, so does selection sort; but the former one is O(n*lgn) in time with O(n) in space;
  4. Not randomized quick sort works OK only if the input data is in random order; it performs badly O(n^2) when data is sorted or nearly sorted (beware of the deep recursion! set the stack size accordingly); 
  5. Quick sort works faster when partitioned from two sides, optimized by randomization and combined with insertion sort (Programming Pearls, Sorting);
  6. Choose the "right" sorting strategy based on the data size, data order, memory/time req., ...

Test on gcc 4.6.2


No comments:

Post a Comment

(Coding && Eating) || Sleeping