Friday, December 2, 2011

Maps (Java Collections)

Using collections in Java: Maps

In programming terms, a map is a set of associations between pairs of objects. It crops up extremely frequently in programming in all sorts of cases where we want to deal with cases of "for X, what is the Y"? For example:

"for a given language code, what is the message to display meaning 'Please log in'?"
"for a given word in the document I'm processing, how many times did it occur do far?"
"for a given row ID number, what is the cached copy of the row from the database"?
"for a given user on my web server, what is their current session status"?
Notice the third item here: a common use for a map is as a simple cache.

Creating and using a map

The syntax for creating a map is largely similar to lists and sets. But this time we need to specify the type of object we want to map from and the type to map to. Let's say we want to store an association of country codes with country names, in other words, String to String:


Map countryNames = new HashMap(200);


In this case, we use the common map implementation HashMap. As with the set, we supply an estimate of the number of mappings we expect to add, to help the map operate more efficiently1. To add an association to the map, we actually use a method called put():


countryNames.put("GB", "Great Britain");
countryNames.put("FR", "France");
countryNames.put("IT", "Italy");
countryNames.put("FW", "Far Far Away");


Then, to retrieve the country name for a particular code, we call:


String name = countryNames.get("IT");


Other things to note about Maps:

The items that we associate from are called keys; the items they are associated to (the country names here) are called values.
If for a particular key there is no associated value, then get() returns null.
A given key can be associated with only one value. If you put("X", "Y") then subsequently put("X", "Z"), then get("X") will return Z, the last association made with that key.
The put() method actually returns the previous association with the given key (or null if there was none).

More on maps

For more advanced uses, it is useful to understand how maps work, in particular about the technique called hashing from which the HashMap and HashSet get their name.

Note: Every time a collection becomes "full", it generally has to re-organise itself to accommodate the extra capacity needed. For example, an ArrayList needs to create a new, larger array. In the case of HashSet or HashMap, it turns out that this re-organisation is quite an "expensive" operation. So if we can anticipate roughly how many items will be added in the first place, we cut down on this expensive re-organisation.

How Maps work

The map is a fundamental type of structure that allows associations to be set up between one set of objects and another. By association, we typically mean situations where we want to say "for a given X, what is the Y?". For example:

for a given IP address, what is the host name?
for a given request to our web server, what is the last result we returned?
for a given parameter, what is the corresponding value?
for a given session ID, what is the user name?
In other words, maps have a wide variety of uses. Various routines would be inefficient and fiddly to write without them. Generally we can see that a map, or an instance of the Java Map interface, is a set of associations. So for example, we could have a map of string parameters to integer values:


Map params = ...


Then, we can set and get parameters on this map:


params.set("maxConnections", 20);
params.set("maxThreads", 10);
...
int maxConnections = params.get("maxConnections");
int maxThreads = params.get("maxThreads");


The items that we associate from are referred to as keys. The items that we associate to (the integers in this case) are, in Java at least, usually referred to simply as values. Conceptually, you could imagine (and in principle implement) a Map as an array containing the keys and another array containing the values, with code such as the following to get an item out of the map:


public Integer get(String key) {
  for (int i = 0; i < keys.length; i++) {
    if (key.equals(keys[i])) {
      return values[i];
    }
  }
  return null;
}


Note that instances of Map always map objects to objects. So in fact, a Map of strings to integers would store Integer objects as the values. In our above example, we can (so long as we're careful about nulls) read and write straight to int primitives thanks to the 'autoboxing' feature of Java 5.

Introducing the hash map

Now, implementing a map with two arrays like this is conceptually simple, but would actually be quite inefficient for maps of any significant size. For example, imagine if the map had 10,000 entries in it (and in practical uses, it's not uncommon to want to store tens of thousands of items if not more in a map). In that case, on average we'd have to compare 5,000 items with the one passed in every time we called get()! One improvement that may be appropriate in some cases is to store the list sorted (and actually, Java provides some map implementations based on keeping the map data sorted, albeit in a more sophisticated way than a simple list). But that in turn can make insertions and deletions from the map expensive, since on every operation we have to maintain the map in its sorted state.

An alternative to keeping the map data sorted is to use hashing.

No comments:

Post a Comment