Sunday, January 12, 2014

The downsides of .NET's Lookups

In the previous post I was admiring the wonderful features of .NET's Lookup - its immutability, null-safeness and clean and readable presence. I've also promised to look for its downsides or pitfalls one may encounter when using it.

Are there any? Well, to be honest, I can't see any serious problems with using Lookups. If you need a collection to be used as a lookup table by some identifier, with no modifications support, with multiple values per key - you won't find anything better suited than ILookup.

There are just two things I find to be a slight impediments in ideal Lookup's world.

The first is that the only way to create an ILookup instance is via LINQ's ToLookup method. Although the framework provides the default public ILookup implementation - a Lookup class - it has no public constructor available. It means that in order to create our lookup, we need to prepopulate some other collection just to convert it later using ToLookup().

The second thing is that the beautiful clean API provided by ILookup<TKey, TValue> goes away immediately if we need to do anything other than just read our lookup. If we need to have a lookup created from existing lookup but with one more value, or have our lookup filtered, or joined with another, or whatever, it breaks into IEnumerable<IGrouping<TKey, TValue>> - and the readability hurts a lot. IGrouping represents a single lookup item which is a values collection with its associated key. Still nice to access the data, but not nice to operate on. There's no public implementation of IGrouping available in the framework - the implementation is internal to LINQ. There's no way to modify the grouping, what is understandable as it is part of immutable ILookup, but there's also no way to create new grouping based on existing one - the workaround is to convert our whole lookup back to some mutable collection like Dictionary<TKey, TValue>, mutate it and then convert to new lookup using the convoluted LINQ projections first, something like this:

var temporaryDictionary = lookup.ToDictionary(x => x.Key, x => x.ToArray());
temporaryDictionary.Add("new key", new[] { "new value" });
var newLookup = temporaryDictionary
    .SelectMany(kv => kv.Value, (kv, v) => new { kv.Key, Value = v })
    .ToLookup(x => x.Key, x => x.Value);

Or alternatively, if we don't mind losing existing key-values mappings and we are able to recreate it once again, we can flatten our lookup to List<TValue>, modify the list and then use the simplest ToLookup() overload.

var temporaryList = lookup.SelectMany(x => x).ToList();
temporaryList.Add("new value");
var newLookup = temporaryList.ToLookup(x => SomethingThatRecreatesKey(x));

No matter which way we choose, it's quite a lot of work for such a simple thing.

Next time I'll try to propose a solution for both the issues. Stay tuned!

2 comments:

  1. Great write-up!
    What would the memory usage look like on a Lookup on 100,000 objects in a list, with 10 properties populated ink each instance. Does the Framework make a lot of copies of the original list, keeps pointers, or something else altogether?

    Thanks.

    ReplyDelete
  2. There's no reason you couldn't write your own implementation of ILookup that is backed by a Dictionary>! That might make it easier to work with in some cases (such as deserialization).

    ReplyDelete