Current Article  

Merge sort

Merge sort issort algorithmrearranging lists (or any other data structure that can only be accessed sequentially, e.g. file streams) intospecified order. It isparticularly good example ofdivideconquer algorithmic paradigm.

Conceptually, merge sort works as follows :

Iflistbe sortedlonger than one item:

  1. Divideunsorted list into two sublistsabout halfsize
  2. Sort each oftwo sublists
  3. Mergetwo sorted sublists back into one sorted list.

Merge sort has an averageworst-case performanceO(n log(n)). This means thatoften needsmake fewer comparisons than quicksort. However,algorithm's overheadslightly higher than quicksort's, and, depending ondata structurebe sorted, may take more memory (though thisbecoming lessless ofconsideration). Italso much more efficient than quicksort ifdatabe sorted can only be efficiently accessed sequentially,is thus popularlanguages such as Lisp where sequentially accessed data structuresvery common. Unlike quicksort, merge sort isstable sort.

Mergesortso sequential that it's practicalrunon tapes if you have four tape drives. It works as follows:

  1. dividedatabe sortedhalfput half on eachtwo tapes
  2. merge individual pairsrecords fromtwo tapes; write two-record chunks alternatelyeach oftwo output tapes
  3. mergetwo-record chunks fromtwo output tapes into four-record chunks; write these alternately tooriginal two input tapes
  4. mergefour-record chunks into eight-record chunks; write these alternately tooriginal two output tapes
  5. repeat until you have one chunk containing alldata, sorted --- that is,lg n passes, where n isnumberrecords.

On tape drives that can run both backwardsforwards, you can run merge passesboth directions, avoiding any time rewinding.

Forsame reason italso very usefulsorting data on disk that is too bigfit into primary memory.

This might seembehistorical interest only, but on modern computers, localityreference isparamount importancesoftware optimization, because we have deep memory hierarchies. This might change if fast memory becomes very cheap again, or if exotic architectures likeTera MTA become commonplace.

Heresome C code that does merge sort. It assumes that two arrays, v1v2 have been allocatedbesize n/2;will be used formerging operation: (from PD [lecture notes])

void merge (float [], int, int, int);

/* sort(sub)array v from startend */

void merge_sort (float v[], int start, int end) { int middle; /*middle ofsubarray */

/* no elementssort */ if (start == end) return;

/* one element; already sorted! */ if (start == end - 1) return;

/* findmiddle ofarray, splittinginto two subarrays */ middle = (start + end) / 2;

/* sortsubarray from start..middle */ merge_sort (v, start, middle);

/* sortsubarray from middle..end */ merge_sort (v, middle, end);

/* mergetwo sorted halves */ merge (v, start, middle, end); }

/* mergesubarray v[start..middle]v[middle..end], placing the

* result back into v.
*/
void merge (float v[], int start, int middle, int end) { int v1_n, v2_n, v1_index, v2_index, i;

/* numberelementsfirst subarray */ v1_n = middle - start;

/* numberelementssecond subarray */ v2_n = end - middle;

/* fill v1v2 withelements offirstsecond * subarrays, respectively */ for (i=0; i /* v1_indexv2_index will index into v1v2, respectively... */ v1_index = 0; v2_index = 0;

/* ... as we pick elements from one orotherplace back * into v */ for (i=0; (v1_index < v1_n) && (v2_index < v2_n); i++) {

/* current v1 element less than current v2 element? */ if (v1[v1_index] <= v2[v2_index])

/* if so, this element belong as nextv */ v[start + i] = v1[v1_index++]; else /* otherwise,element from v2 belongs there */ v[start + i] = v2[v2_index++]; } /* clean up; either v1 or v2 may have stuff left in*/

for (; v1_index < v1_n; i++) v[start + i] = v1[v1_index++]; for (; v2_index < v2_n; i++) v[start + i] = v2[v2_index++]; }


Copyright 2004. All rights reserved.