NOtherDev: collections

Sunday, January 12, 2014

The downsides of .NET's Lookups

In the previous post I was admiring the wonderful features of .NET's Lookup - its immutability, null-safeness and clean and readable presence. I've also promised to look for its downsides or pitfalls one may encounter when using it.

Are there any? Well, to be honest, I can't see any serious problems with using Lookups. If you need a collection to be used as a lookup table by some identifier, with no modifications support, with multiple values per key - you won't find anything better suited than ILookup.

There are just two things I find to be a slight impediments in ideal Lookup's world.

The first is that the only way to create an ILookup instance is via LINQ's ToLookup method. Although the framework provides the default public ILookup implementation - a Lookup class - it has no public constructor available. It means that in order to create our lookup, we need to prepopulate some other collection just to convert it later using ToLookup().

The second thing is that the beautiful clean API provided by ILookup<TKey, TValue> goes away immediately if we need to do anything other than just read our lookup. If we need to have a lookup created from existing lookup but with one more value, or have our lookup filtered, or joined with another, or whatever, it breaks into IEnumerable<IGrouping<TKey, TValue>> - and the readability hurts a lot. IGrouping represents a single lookup item which is a values collection with its associated key. Still nice to access the data, but not nice to operate on. There's no public implementation of IGrouping available in the framework - the implementation is internal to LINQ. There's no way to modify the grouping, what is understandable as it is part of immutable ILookup, but there's also no way to create new grouping based on existing one - the workaround is to convert our whole lookup back to some mutable collection like Dictionary<TKey, TValue>, mutate it and then convert to new lookup using the convoluted LINQ projections first, something like this:

var temporaryDictionary = lookup.ToDictionary(x => x.Key, x => x.ToArray());
temporaryDictionary.Add("new key", new[] { "new value" });
var newLookup = temporaryDictionary
    .SelectMany(kv => kv.Value, (kv, v) => new { kv.Key, Value = v })
    .ToLookup(x => x.Key, x => x.Value);

Or alternatively, if we don't mind losing existing key-values mappings and we are able to recreate it once again, we can flatten our lookup to List<TValue>, modify the list and then use the simplest ToLookup() overload.

var temporaryList = lookup.SelectMany(x => x).ToList();
temporaryList.Add("new value");
var newLookup = temporaryList.ToLookup(x => SomethingThatRecreatesKey(x));

No matter which way we choose, it's quite a lot of work for such a simple thing.

Next time I'll try to propose a solution for both the issues. Stay tuned!

Monday, January 6, 2014

Lookup - .NET's hidden gem

.NET Framework has a lot to offer when comes to generic data structures, especially collections. We have several lists, arrays, sets, dictionaries - each of them has its performance and functional characteristics and scenarios where it fits much better than the others. As I'm observing in the project I work in, one of the most often forgotten yet powerful type of collection is Lookup. I think it needs more attention.

ILookup was introduced to the framework as a part of LINQ and is exclusively used in LINQ context, as a result of ToLookup calls. Logically, it is a key-value collection like IDictionary, but it is designed to hold multiple values in a single key. Also, unlike Dictionary<K, V>, it is immutable, what means that after creating the lookup, we can only read it - no modifications are possible.

Generally one may say that ILookup<K, V> is no better than IDictionary<K, IEnumerable<V>> and as such is redundant. But I'd strongly disagree. First of all, let's think how are dictionaries of collections used. From my experience, it's most often used to keep predefined static "dictionaries" that are loaded once and live long to provide easy access to some rarely changing values like what counties are there in each country etc. Well, these are even often called "lookup tables". Note that in most cases we don't ever change the stored values - it is immutable. We replace it as a whole periodically or when something changes. Doesn't it sound exactly like ILookup<K, V> key features discussed above? We should always be using the tools that are best suited for the job to avoid reinventing the wheel and avoid the problems others already solved.

Moreover, ILookup's definition is clearly more readable than nested generics in its dictionary-based equivalent. When I look at the code never seen before and I see ILookup<K, V>, I instantly know that it is the lookup table and I feel its read-only, one-way nature. When I see IDictionary<K, IEnumerable<V>> I need to consider the possibility that it is used as a read-write store for some values and I, as a user of this code, can (or should?) modify that collection. Using Lookup for lookups just makes sense.

IDictionary-based lookups, as mutable data structures may kick us badly. I've once experienced quite an ugly error caused by passing around the plain List<T> read from what was intended to be static lookup shared between users and was based on Dictionary<K, List<V>>. The intention was to pass around the immutable lookup data, but the receiving code actually got mutable list being exactly the same reference that was sitting in the global lookup. The value passed was eventually customized for each user and there were miracles happening to the global lookup. All that would be just impossible with lookups.

With dictionaries that are open for modification and not thread-safe by default, we are also subject to multiple kinds of failures in multi-threaded scenarios. All of it is gone with immutable data structure like ILookup.

And last but not least - null checks. Who likes to litter his code with all that boring null checks? When using IDictionary as a lookup table, every piece of code that consumes its data need to check for key existence first or have a conditional statement with TryGetValue method, because calling dictionary[nonExistingKey] would throw dreadful KeyNotFoundException. Moreover, even if the key exists, the value may still legally be null, so to read the collection contained under the key we need to have two checks:

if (dictionary.ContainsKey(theKey))
{
     var collectionOrPossiblyNull = dictionary[theKey];
     if (collectionOrPossiblyNull != null)
     {
          DoSomethingWithValues(collectionOrPossiblyNull);
     }
}

or, with more compactness but same complexity:

T collection;
if (dictionary.TryGetValue(theKey, out collection) && collection != null)
     DoSomethingWithValues(collection);

ILookup<K, V> was designed totally differently around key existence and null values. It just returns an empty collection for non-existent keys. And thanks to its strictly controlled way of creating (by LINQ's ToLookup method only), we can also be sure that no null collection is possible. And the code above just looks as it should like:

DoSomethingWithValues(lookup[theKey]);

Isn't it beautiful?

Next time I'll try to find some pitfalls or inconveniences of using lookups.