Monday, January 6, 2014

Lookup - .NET's hidden gem

.NET Framework has a lot to offer when comes to generic data structures, especially collections. We have several lists, arrays, sets, dictionaries - each of them has its performance and functional characteristics and scenarios where it fits much better than the others. As I'm observing in the project I work in, one of the most often forgotten yet powerful type of collection is Lookup. I think it needs more attention.

ILookup was introduced to the framework as a part of LINQ and is exclusively used in LINQ context, as a result of ToLookup calls. Logically, it is a key-value collection like IDictionary, but it is designed to hold multiple values in a single key. Also, unlike Dictionary<K, V>, it is immutable, what means that after creating the lookup, we can only read it - no modifications are possible.

Generally one may say that ILookup<K, V> is no better than IDictionary<K, IEnumerable<V>> and as such is redundant. But I'd strongly disagree. First of all, let's think how are dictionaries of collections used. From my experience, it's most often used to keep predefined static "dictionaries" that are loaded once and live long to provide easy access to some rarely changing values like what counties are there in each country etc. Well, these are even often called "lookup tables". Note that in most cases we don't ever change the stored values - it is immutable. We replace it as a whole periodically or when something changes. Doesn't it sound exactly like ILookup<K, V> key features discussed above? We should always be using the tools that are best suited for the job to avoid reinventing the wheel and avoid the problems others already solved.

Moreover, ILookup's definition is clearly more readable than nested generics in its dictionary-based equivalent. When I look at the code never seen before and I see ILookup<K, V>, I instantly know that it is the lookup table and I feel its read-only, one-way nature. When I see IDictionary<K, IEnumerable<V>> I need to consider the possibility that it is used as a read-write store for some values and I, as a user of this code, can (or should?) modify that collection. Using Lookup for lookups just makes sense.

IDictionary-based lookups, as mutable data structures may kick us badly. I've once experienced quite an ugly error caused by passing around the plain List<T> read from what was intended to be static lookup shared between users and was based on Dictionary<K, List<V>>. The intention was to pass around the immutable lookup data, but the receiving code actually got mutable list being exactly the same reference that was sitting in the global lookup. The value passed was eventually customized for each user and there were miracles happening to the global lookup. All that would be just impossible with lookups.

With dictionaries that are open for modification and not thread-safe by default, we are also subject to multiple kinds of failures in multi-threaded scenarios. All of it is gone with immutable data structure like ILookup.

And last but not least - null checks. Who likes to litter his code with all that boring null checks? When using IDictionary as a lookup table, every piece of code that consumes its data need to check for key existence first or have a conditional statement with TryGetValue method, because calling dictionary[nonExistingKey] would throw dreadful KeyNotFoundException. Moreover, even if the key exists, the value may still legally be null, so to read the collection contained under the key we need to have two checks:

if (dictionary.ContainsKey(theKey))
     var collectionOrPossiblyNull = dictionary[theKey];
     if (collectionOrPossiblyNull != null)

or, with more compactness but same complexity:

T collection;
if (dictionary.TryGetValue(theKey, out collection) && collection != null)

ILookup<K, V> was designed totally differently around key existence and null values. It just returns an empty collection for non-existent keys. And thanks to its strictly controlled way of creating (by LINQ's ToLookup method only), we can also be sure that no null collection is possible. And the code above just looks as it should like:


Isn't it beautiful?

Next time I'll try to find some pitfalls or inconveniences of using lookups.


  1. Immutability is a plus :)
    Shame there's no immutable BCL collection for 1-1 key-value mappings
    There's pros and cons of the non-nullable collection

    Instead of
    if (x.ContainsKey(y)) { ... }
    you have to do
    if (x[y].Count() > 0) { ... }

    which is further from the semantic of what the code's doing.


  2. "Isn't it beautiful?"