top of page
Edument

Surviving sorting in JavaScript



A built-in .sort() method, how hard can it be to get it right? Taking JavaScript as our yardstick, apparently really quite hard. Read about two major pitfalls that anyone who wants to sort arrays in JavaScript should be aware of.

JavaScript has a .sort method, but this method comes with a number of sharp corners. This blog post explains how to learn to live with sorting in JavaScript.

The ancient art of arranging elements

Here's the .sort method:

> ["banana", "coconut", "apple"].sort()
["apple", "banana", "coconut"]

Easy-peasy! As you can see above, .sort() returns the sorted result.

But, as a salesman on late-night TV might put it, that's not all! Instead of giving you a new array with the elements sorted, JavaScript's sort method is in-place — it uses the array you passed in to re-arrange the items.

> let fruits = ["banana", "coconut", "apple"]
> fruits.sort()
> // the array `fruits` is now sorted: `["apple", "banana", "coconut"]`

You read that right. JavaScript's .sort() both modifies the array, and returns it.

Because of this, it's slightly bad style to assign the result of the sort back to the original array:

> fruits = fruits.sort();     // gör inte detta!

It's not wrong per se, it's just redundant, akin to cooking your dinner an extra time when it's already cooked. My advice is to never assign in connection with .sort() . (Chaining sorts is fine, since we're dealing with the same array reference all the time. But it's wrong because of the lack of stability; see later in this post.)

It's also important in contexts that demand immutability (such as Redux) to remember that .sort() clobbers the original array; if this is not acceptable, you can clone the array before sorting:

> let sortedFruits = [...fruits].sort();

Math is hard, let's go shopping

Since we had such amazing success with fruits, we immediately turn to numbers:


> [1, 2, 17, 24, 123, 1024].sort()
[1, 1024, 123, 17, 2, 24]


Oh no! JavaScript's answer runs up aginst our deeply-held beliefs about the order numbers appear on the real number line!

What's going on here? The reason the fruits came out right but the numbers didn't is that JavaScript's sort is lexicographical by default. It compares everything as strings — and indeed, "1024" would appear somewhere between "1" and "123" in a (truly huge) dictionary of numbers.

(Cue a legion of upset readers emailing me about the impossibility of putting all the real numbers, in order, into a dictionary. Dear hypothetical irate readers, calm down. We're only talking about putting all the IEEE 754 double-precision floating-point numbers into a dictionary. This is still a very stupid thing to do, but clearly doable.)

Before we look at how to fix numerical sorting, we might ask why the default is lexicographic order. I think there are two related reasons:

  • JavaScript, being loosely typed, doesn't see "array of strings" or "array of numbers"; it just sees "array". In a sense, it has to decide on a default sorting order without knowing anything about the types of the array's elements.

  • If we were to pick a sorting order that works for everything, including heterogenous arrays containing mixed types, then lexicographic order is it. This is based on the deep theory of "everything can be turned into a string, however badly".

Fixing numerical sorting

As the MDN page for Array.prototype.sort points out, the .sort() method accepts an optional compareFunction parameter. This is how we get into the game of specifying the order, for example numerical order.


function compareNumerically(a, b) {    
         if (Number(a) < Number(b)) {        
             return -1;    
         } else if (Number(a) > Number(b)) {        
               return +1;   
         } else {        
               return 0;    
         }
}

A compare function like the one above promises to adhere to a protocol, namely to return -1, 0, or +1 depending on the relative ordering between its two parameters. (It's allowed to return any positive or negative value, but you will rarely need this.) -1 means " a is less than b ", 0 means " a and b are equal", +1 means " a is greater than b ".

But not just that. The .sort() method expects the compare function to be consistent, and return the same output for the same inputs. It also expects the compare function to specify a total order, that is, all the elements can be placed in some order on a line such as the number line. (It is ok for values to compare as equal even if they are not identical. More about that below.)

If you ask me, the -1, 0, +1 values are slightly magical and unexplained. They belong to a bygone era where such protocols were considered cute and knowing about them was part of belonging to the in-group of battle-scarred programmers. We can do better by naming the constants for what they represent:


const LESS = -1, EQUAL = 0, GREATER = +1;
function compareNumerically(a, b) {    
        if (Number(a) < Number(b)) {        
             return LESS;    
        } else if (Number(a) > Number(b)) {        
             return GREATER;    
       } else {        
             return EQUAL;    
       }
}

Now the numeric sort works:


> [1, 2, 17, 24, 123, 1024].sort(compareNumerically)
[1, 2, 17, 24, 123, 1024]

A quick aside: because we're allowed to return any positive or negative number from a compare function, this shorter version would also do the trick:

function compareNumerically(a, b) { return a - b }

But, as we will see below, we will get more benefits in this blog post by sticking to the longer, spelled-out implementation.

Comparing by property

Let's say, hypothetically, that I have so many computer games that I need to keep track of them in a little database and show them in a table on a web page. A game is represented by this class:


class Game {    
       constructor(title, publisher, year) {        
               this.title = title;        
               this.publisher = publisher;        
               this.year = year;    
       }
}

Keep in mind, this is hundreds of games we're talking about. Hypothetically. If we have an array of games, how do we sort them sensibly?

Emboldened by our success with the numbers, we can define a sorting function for each of the properties of a game:



function compareByTitle(a, b) {    
        if (a.title < b.title) {        
            return LESS;    
       } else if (a.title > b.title) {        
            return GREATER;    
       } else {        
            return EQUAL;    
      }
}
function compareByPublisher(a, b) {    
        if (a.publisher < b.publisher) {       
            return LESS;    
        } else if (a.publisher > b.publisher) {        
            return GREATER;    
        } else {        
            return EQUAL;    
        }
}
function compareByYear(a, b) {   
        if (a.year < b.year) {        
            return LESS;    
        } else if (a.year > b.year) {        
            return GREATER;    
        } else {        
               return EQUAL;    
        }
}

Phew! That's a lot of code. Not only that, it's a lot of repetitive code. That's, like, the worst kind.

If we squint at the three functions, we see that they're all doing the same thing, but with different properties. That is, we can factor out the "compare by property" part from the part that specifies the specific property. Here, let's make a function that returns a custom compare function:

function compareByProperty(p) {    
        return (a, b) => {        
                 if (a[p] < b[p]) {            
                      return LESS;        
                 } else if (a[p] > b[p]) {            
                      return GREATER;        
                 } else {            
                      return EQUAL;        
                 }    
        };
}

Now the three compare functions can be defined super-simply, the sweet reward for keeping things DRY:


let compareByTitle = compareByProperty("title");
let compareByPublisher = compareByProperty("publisher");
let compareByYear = compareByProperty("year");

That's nice, but we can go one step further. We can factor out the "compare by" part from the "property" part! Behold:


function compareBy(fn) {    
        return (a, b) => {       
                 if (fn(a) < fn(b)) {            
                      return LESS;        
                 } else if (fn(a) > fn(b)) {            
                      return GREATER;        
                 } else {            
                      return EQUAL;        
                 }    
         };
}
function property(p) {    
         return (e) => e[p];
}

Our property compare functions now look like this:


let compareByTitle = compareBy(property("title"));
let compareByPublisher = compareBy(property("publisher"));
let compareByYear = compareBy(property("year"));

What did we gain from this last factoring? The compareBy function is more general, and can swallow up the compareNumerically function we laboriously defined above:

let compareNumerically = compareBy(Number);

It's nice when things get easier! And it reads nicely, too. "Compare by Number".

Doing the same thing with other wrapper types would also work: compareBy(Boolean) , and compareBy(String) . Again, the latter is the default, so we don't need to specify it unless we want to be explicit.

Taking a step back, this is nice simply because it aligns better with our intuition about sorting. The "compare function" convention with -1, 0, and +1 is flexible, but it's not intuitive. Often the thought we want to express is "compare based on something that we can compute from each element". This is what compareBy does. Java 8 adds a static method Comparator.comparing which also does exactly this.

Very stable genius

I need to show my games in some nice predictable order on the page. Let's say I want to order them primarily by publisher, but for every publisher I want them sorted by title. So the problem is "Sort by publisher, and then (if games have the same publisher) by title".

Can we do this in JavaScript? The answer is of course yes, but JavaScript will be damned if it makes life easy for us.

One moderately clever thing we can attempt is to sort twice:


games.sort(compareByTitle)     
            .sort(compareByPublisher);  // note: DON'T DO THIS -- see explanation below

Note the backwards thinking here: since we want the games to be sorted primarily by publisher, we do that sort last. This reasoning is weird but correct.

In order for this to work, the second sort will need to respect and preserve the sorted-by-title order from the first sort, whenever it compares two games by the same publisher. That is, if compareByPublisher says that the result is EQUAL then .sort() should say "fine, I'll just keep the order as it already is".

A sorting algorithm that does this is called stable. It's not a tautological property — a sorting algorithm can also be unstable, in which case it makes no guarantees about the final order of EQUAL elements (and very likely jumbles them).

So... the question becomes is JavaScript's sort guaranteed to be stable?

And the answer, as you already suspected with that sinking feeling in your stomach, is no. It would be terribly useful for us to be able to assume a stable sort. But with JavaScript's built-in .sort() , you can't assume that. JavaScript is many things, but it's not your best buddy.

We speak often about JavaScript, but in reality there are multiple implementations of JavaScript out there. Currently, there are these major implementations:

  • v8 (Chrome, Node.js) — unstable sort

  • JavaScriptCore (Safari) — stable sort

  • SpiderMonkey (Firefox) — stable sort

  • ChakraCore (Edge) — stable sort

On v8, sorting is unstable. In all the other major browsers, sorting is stable. (In older versions of Firefox and Opera, sorting used to be unstable.) If we want our code to work the same across all browsers, past and present, we cannot assume stability.

(In fact, the situation with v8 is even more detailed than that. If your array happens to have 10 elements or fewer, that sort will be stable, because v8 falls back on insertion sort for performance reasons, and insertion sort is stable! So you will get a stable sort sometimes, for the kind of short lists you are likely to use as test cases.)

Ok, so forget about the implementations. What does the ECMAScript specification have to say about the stability of sorting? Maybe we can force the specification down the implementors' throats?

The elements of this array are sorted. The sort is not necessarily stable [...]

Apparently not. Thanks for nothing, ECMAScript specification!

By the way, JavaScript is in fairly good company here. Some languages (Java, Python) have a built-in library sort function that guarantees stability, whereas other languages (Ruby, C#) don't. It seems to be connected to the fact that traditionally, there has been no good way to make QuickSort stable and maximally performant at the same time. Stability sacrificed for speed.

Fixing stable sort

Thinking about how to make sorting stable, what we need is a "chained compare function" which does what stable sort should have done for us automatically: it compares by publisher first, and if those come out equal, it falls back to comparing by title:


function compareByPublisherThenTitle(a, b) {    
        if (a.publisher < b.publisher) {        
             return LESS;    
        } else if (a.publisher > b.publisher) {        
             return GREATER;   
        } else {        
               if (a.title < b.title) {            
                    return LESS;        
               } else if (a.title > b.title) {            
                    return GREATER;        
               } else {            
                    return EQUAL;       
               }    
        }
}

Visually, this code looks like one compare function shoved into another compare function. And yes, this works:


> games.sort(compareByPublisherThenTitle);

But clearly we can do better, in terms of factoring out common solutions. How about if we have a "chained compare", that can coordinate between two compare functions, falling back to the second if the first returns EQUAL ?

function chained(fn1, fn2) {    
        return (a, b) => {        
                 let v = fn1(a, b);        
                 return v === EQUAL            
                        ? fn2(a, b)            
                        : v;   
       };
}

Now we can use this chaining mechanism instead of manually writing a huge combined compare function:


let compareByPublisherThenTitle = chained(compareByPublisher, compareByTitle);

Or, if we want:


let compareByPublisherThenTitle = chained(compareBy(property("publisher")), compareBy(property("title"))

Stable sort on an existing array

The above is fine for the cases where we don't care about the existing order of the elements. But we can also imagine scenarios where we do.

Consider for example that the games are listed in a table on a page, and the table has clickable headings "title", "publisher", and "year". Clicking on a heading allows you to sort the games based on that property.

Of course we want this sort to be stable too; that is, any order that already exists among the table rows should be preserved when possible. In this case JavaScript's unstable sort is working maximally against us, and we have to be creative.

In order to preserve the existing order, we would like to arrive at a situation where we can write compareBy(property("index")) , where the index property contains the array index of the element. Unfortunately, there's no such index property in general... so what the below code does is to temporarily add such a property before sorting, and then peel it away afterwards.


let indexedGames = games.map((value, index) => ({ value, index }));
let compareByPropertyStable = (p) => chained(compareBy((e) => e.value[p]), compareBy(property("index")));
indexedGames.sort(compareByPropertyStable("year"));     // for example
games = indexedGames.map((o) => o.value);

Et voilà, stable sorting in our table. Again, this would have been a whole lot easier if JavaScript just had stable sorting built in — but given that it doesn't, this is the easiest we get away with.

We can also write this as a single chain of map - sort - map :


games = games    
          .map((value, index) => ({ value, index }))    
          .sort(compareByPropertyStable("year"))    
          .map((o) => o.value);

(The pattern used above is known in the Perl world as a Schwartzian transform.)

In this case the assignment is necessary, because map generates fresh arrays.

Summary

And we're done! Summarizing a bit:

  • JavaScript expects you to provide compare functions when you want anything other than lexicographic order.

  • There's a very specific way you have to write the compare functions.

  • Instead of writing them out by hand every time, it's shorter, cleaner, safer, and more enjoyable to define (or import) a small number of utility functions ( compareBy , property , chained ) so that we can functionally compose the compare function we want.

  • JavaScript's specification and implementations don't guarantee stable sort in general.

  • Instead, if you want stable sort (as one often does), you have to write your compare functions to take that into account.

  • Even this can be made practical and pleasant by composing functions.


Afterword: a cute DSL

As I was doing research for this post, it struck me that someone might have addressed the same pain points already out in the npm ecosystem. And sure enough, I found the comparators module, which is based on Java 8's API and allows us to sort our games like this:


games.sort(Comparators.comparing("publisher").thenComparing("title"));

In this case, we're sending strings into the comparing methods, but sending an arbitrary selector function works also according to the API. The secret sauce that makes the above work — very JavaScript-y — is that the compare function returned from Comparators.comparing has been adorned with a .thenComparing method.

Afterword: ES2019

Starting with the 2019 edition of the EcmaScript standard, sorting is guaranteed to be stable. Yay!

Be aware though that this does not change the fact that for many more years, there will be older implementations out there on people's computers and devices that do not use a stable sort. So for backwards compatibility for the forseeable future, the advice in this article still very much holds.


By Carl Mäsak

0 comments

Comments


bottom of page