Sunday, December 4, 2011

Making your Java objects sortable

Making your Java objects sortable: the Comparable interface

In the last post, we looked at how to sort 'naturally' sortable objects in Java such as Strings and Integers. But supposing we've created our own class to represent some arbitrary object or data and want to sort instances of that class? Well, Java can still sort a list of our objects, providing that we tell Java how to order any two instances. For this, we need implement the Comparable interface.

Implementing Comparable

Let's consider an example of a class that represents a playing card in a normal deck of cards. For simplicity, the card will have a suit and a number, both represented by an integer. (In more sophisticated implementations, you might consider using Enums instead of an integers.) When we order our cards, we firstly want them ordered according to suit. We'll define a particular order, namely alphabetical order according to suit name, which in English would be clubs, diamonds, hearts, spades1. Then within a suit, we want the cards ordered numerically.


public class PlayingCard {
  public static final int SUIT_CLUBS    = 1;
  public static final int SUIT_DIAMONDS = 2;
  public static final int SUIT_HEARTS   = 3;
  public static final int SUIT_SPADES   = 4;
  ...

  private int suit;
  private int number;
}


We start by making our class implement the Comparable interface. How exactly this looks depends on whether you're using Java 5 or later (in other words, whether your version of Java supports generics). The generic version looks like this:


public class PlayingCard implements Comparable {
  public int compareTo(PlayingCard o) {
    ...
  }
}


whereas the non-generic (pre Java 5) verion looks like this:


public class PlayingCard implements Comparable {
  public int compareTo(Object o) {
    ...
  }
}


In the non-generic version, we'll have some extra casting to do, but the logic will be essentially the same. We need to make the compareTo() method return an integer that determines the order of two playing cards (this playing card with respect to the one passed in). Now, we look in more detail at implementing the compareTo() method.

Making your Java objects sortable: the compareTo method

We just saw that to make our objects sortable, we need to make them implement the Comparable interface. This means providing a compareTo() method that will define the order of two instances of a given object. Specifically, our compareTo() method needs to return an integer as follows:

a negative number if our object comes before the one passed in;
a positive number if our object comes after the one passed in;
otherwise, zero (meaning they're equal in terms of ordering).
Note that the magnitude of the number doesn't matter. The aim isn't to say "how different" the two objects are, just in which direction. So often, we may as well use -1 and 1 to mean "before" and "after" respectively.

In our example of playing cards, we want to order first by suit and then by number. So we first consider what number we would return just by comparing the suit. If this number is not zero, then we can just return that number: if the suit's different, we don't need to bother comparing the number. So the first part of our comparison looks as follows:


public class PlayingCard implements Comparable {
  public int compareTo(PlayingCard o) {
    if (this.suit < o.suit) {
      return -1;
    } else if (this.suit > o.suit) {
      return 1;
    } else {
      // compare number
    }
  }
}


Now we're left with the cases of cards with an identical suit, in which case we need to compare the number in a similar manner:


public class PlayingCard implements Comparable {
  public int compareTo(PlayingCard o) {
    if (this.suit < o.suit) {
      return -1;
    } else if (this.suit > o.suit) {
      return 1;
    } else {
      // suit is identical: compare number
      if (this.number < o.number) {
        return -1;
      } else if (this.number > o.number) {
        return 1;
      } else {
        return 0;
      }
    }
  }
}


Optimising the compareTo() method

The above implementation of compareTo() will work fine. It has the advantage of being easy to follow. If you write the method this way, you're unlikely to run into trouble.

The method as it stands has the slight disadvantage that it's not very succinct and involves a lot of comparisons. In certain circumstances we can reduce the number of comparisons. On the next page, we'll look at possible optimisations to the compareTo() method. We'll also see that care needs to be taken when applying such optimisations.

Optimising the compareTo method...

We saw an example implementation of the compareTo() method of the Comparable interface. Our implementation was methodical and easy to follow, but a bit "long-winded".

...so long as we're careful!

In some cases we can make the comparison method more succinct, but we need to proceed with caution as we'll see in a second.

Using a subtraction instead of a comparison

The first optimisation that can be made in certain cases is to use a subtraction instead of a comparison of numeric values. In principle, this works thanks to some extremely elementary mathematics:

if a < b, then a - b will be negative; if a > b, then a - b will be positive.
Oh, and of course subtracting a number from itself gives zero. So our card comparison routine can now look as follows:


public class PlayingCard implements Comparable {
  public int compareTo(PlayingCard o) {
    int cardComp = this.suit - o.suit;
    if (cardComp == 0) {
      cardComp = this.number - o.number;
    }
    return cardComp;
  }
}



Using subtraction is a very common idiom for comparing numbers in sorting routines, and is almost certainly the reason why compareTo() is defined to return an integer in the first place. Indeed, you may be wondering why we bothered with the previous version of doing comparisons and returning specific values. Well, it turns out that the subtraction method is incorrect for the general case, although it works here.

Problem with this method: comparing large numbers

The above method works because we know that the numbers we'll be comparing will be small. But ints can only store numbers up to a certain magnitude. So if we know that the difference between the two numbers can exceed about 2 billion (232) we should avoid the above method. Converting to longs gives us extra breathing space (the magnitude can now read 263), but the essential problem remains.

This means, for example, that if you're comparing database keys, you should check very carefully what your database's key allocation policy is if you intend to use this method. (Just because you only have 100,000 keys in your table may not mean that the keys are allocated from that range of numbers.)

Is it worth it?

In tests If you run(on Intel architecture), you will see that:

when the compareTo() involves a single comparison, using a subtraction instead of comparisons makes no appreciable difference;
when two comparisons are involved, using subtraction as above (so that our method has just a single comparison with zero) makes the overall sort about 10% faster.
In other words, in the single-variable case, the "optimisation" appears non-existent and in the two-variable case, the performance benefits are minimal. Possibly the main benefit is therefore a readability issue, at least on modern architecture.

Next...

Next, you may wish to look at how to set an arbitrary sort order using the Java Comparator interface.

No comments:

Post a Comment