Monday, February 18, 2013

[Obsolete] should be obsolete

[Obsolete] attribute is a pretty useful concept in the .NET Framework, designed to mark parts of our code that is planned to be phased out and its usage should be avoided. It makes a lot of sense to mark methods or types as obsolete, e.g. when we drop support for some parts of our API and the code is planned to be removed when all the customers are migrated, or when we've restructurized our code so that there's a newer approach for particular problem and we just had no time or possibility to upgrade all the usages, etc.

Recently, in the project I work in I've stumbled upon a pretty useful class, being part of our project-wide infrastructure and heavily used through the project, marked with an [Obsolete] attribute. Unfortunately, the parameterless overload, without a message, was used. I didn't know that part of code well, so I decided to consult with the author what does it mean for this class to be obsolete. Quick research through the revision history traced back to a developer who is no longer working in the project. Also, noone in the team could point me to an alternative approach I should use, most probably because there was none yet.

I've lost some time trying to figure out what's the reason of that spooky [Obsolete] attribute and I've only came to the conclusion that the developer who put it there must have been planning some redesign or was just unhappy with the class as it is. And even if I'm actually wrong and there is a valid reason for not using the class in question, I had no way to know that, just because someone was too lazy to leave a simple message in the attribute. Without the message, we're just having a pointless compiler warning whenever we're using that class.

To avoid such a confusing situations and a loss of time for our successors, teammates, or even for ourselves, I think we should never define the [Obsolete] attribute using its parameterless constructor. In fact, in my opinion, it should be marked as obsolete itself.

We have such an easy alternative. Wouldn't it be nicer if I had encountered the attribute with a message provided? Like [Obsolete("Use class Foo instead.")] or [Obsolete("Support for LoremIpsum dropped as of v. 3.2")]?

Thursday, February 14, 2013

NHibernate's test suite is a mess

I'm not doing it on purpose, I haven't converted into fighting NHibernate's foe, but it looks it will be another rather not encouraging post about NHibernate. But when I saw it's so called unit tests suite, I couldn't resist on sharing my thoughts.

NHibernate is using CodeBetter's TeamCity. We can see more than 200 tests ignored there. This doesn't bode well itself.

I've looked at NHibernate 3.2 source code I already had on the disk, but I've also compared it a bit with the current GitHub version and the situation haven't changed drastically since then.

Looking at the scope of most of the tests, they are more integration than unit tests - they run a complete top-down calls. This means that it requires a deep knowledge of the system to tell what went wrong when test fails and virtually no chance for anyone not being deep inside the codebase to know how to fix it.

It's nothing wrong with integration testing, of course, but not if the recurring pattern is that they are just a smoke tests - verifying only if there was an exception thrown or not. I've run the tool to find tests without assertions (that I'm going to publish some day) - and it turned out that there are 328 assertionless tests! More than 7% of the whole suite doesn't actually verify any outcome. And it is for sure possible to remove a whole lot of the code, even theoretically test-covered code, and cause no test to fail!

But way more interesting are the tests that do not increase the code coverage even incidentally! I like this one:

[TestFixture]
public class Fixture : BugTestCase
{
    [Test]
    public void Bug()
    {
        // Do nothing, we only want to check the configuration
    }
}

Note also the incredibly pure naming schema. There are 36 tests named Bug and 35 named just Test in the codebase and at least 80 test classes named Fixture. There is some kind of convention for namespaces, though, but with a lot of exceptions.

What else can we see in the NHibernate's "unit" test suite? Well, everything! We have a real file system access, real database access, time measurement, dependency on driver configuration etc. There is a lot of code which should be unit-tested itself.

Those tests, no matter if we name it unit-, integration- or whatever-tests, are poorly factored and contain multiple asserts. Look at the MasterDetail test here - 180 lines of code and 40 asserts in single test! The longest class test I've encountered was FooBarTest - it has 5729 lines (and has pretty nice name, hasn't it?).

And for the end, this is the line I liked the most:

catch(Exception e)
{
    Assert.IsNotNull(e); //getting ride of 'e' is never used compile warning
}

Wow, we have an assertion! But, as the comment suggests, it was added only to shut the compiler up, not to verify anything valuable. Not to mention that there are better ways to cope with compiler warnings...

Anyway, if you'd ever be looking for a good unit test suite to learn something from, at least you know where definitively not to look!

Monday, February 11, 2013

Extension for ordering an Enumerable with custom ranking

There are a lot of scenarios when we want to have the data sequence ordered according to some multi-level ordering rules that solve ties, but in the same time we do want to have the ties visible through the ordinal, ranking position. Let's take a sports league table as an example.

RankTeamPointsGoals lost
1Linq United62
2F.C. Async42
Enumerable Rangers44
4Generics Pitfalls07

In this example the teams are ordered by the highest number of points and in case of ties, from the lowest lost goals count. But the ranking cares only about the first criterion and in case of tie it places all the tied teams on the same position but later restarts numbering as if there were no ties - the ranking doesn't need to grow monotonically.

Recently I was to implement such a ranking and sorting feature with far more complicated, dynamic rules and I found the task far more complex than I initially thought, especially when taking performance into consideration. I wanted to avoid iterating the collection many times, but in the same time I wanted to keep things simple and rely on LINQ's ordering implementation which is very convenient to use even in multi-level sorting (OrderBy > ThenBy > ThenBy).

Before I mention anything about how LINQ sorting works - let me point an important resource in the topic - it's Jon Skeet's Edulinq series in which he reimplements LINQ on his own, with incredibly great insight. Read it all if you haven't already - it's definitely one of the most useful .NET blog posts series ever.

So, LINQ's OrderBy is backed by a pretty standard sort mechanics that uses our criteria to create the single comparer object used through the whole sort algorithm's work. However, my ranking rules are defined as a subset of ordering rules, so the comparer I'd like to use for ranking is most likely different than the sorting one. Given that I'm not going go reimplement sorting on my own, I'll need to iterate the data twice (but the second run will be on the sorted sequence - no issues with non-deterministic sequences - the original sequence will be run once only).

The implementation goes as follows: In the first iteration, just run the standard sorting. In the second run then, prepare the comparer according to the ranking rules, assign the rank 1 to the first element and then compare each pair of adjacent elements from the sequence, deciding whether the second one should get the same rank as the first one or just the next ordinal number (loop counter).

Let's define generic structures that will be used as input and output of our extension method on IEnumerable<T>. SortDefinition<T> is a class that holds the sorting field selector (the same as passed to the "normal" OrderBy/OrderByDescending methods) and a flag that decides about the sort order (I needed to have the ability to define complex and dynamic sorting criteria, it's much easier done with a flag than using LINQ's approach with different methods for ascending and descending sort). There is also Ranked<T> wrapper for our sequence elements type that carries the item together with it's ranking information:

public class Ranked<T>
{
    public Ranked(T item, int rank)
    {
        Item = item;
        Rank = rank;
    }

    public T Item { get; private set; }
    public int Rank { get; private set; }
}

public class SortDefinition<T>
{
    public SortDefinition(Func<T, object> selector, bool isDescending = false)
    {
        Selector = selector;
        IsDescending = isDescending;
    }

    public Func<T, object> Selector { get; private set; }
    public bool IsDescending { get; private set; }
}

Next, here is the public extension method and private method that encapsulates sorting itself. It's using only plain LINQ operators, so we have deferred execution here for free.

public static IEnumerable<Ranked>T>> OrderAndRankBy<T>(
    this IEnumerable<T> source,
    IEnumerable<SortDefinition<T>> sortAndRank,
    IEnumerable<SortDefinition<T>> sortOnly)
{
    if (source == null)
        throw new ArgumentNullException("source");

    var sortAndRankArray = sortAndRank.ToArray();
    var allSorters = sortAndRankArray.Concat(sortOnly).ToArray();
    if (allSorters.Length == 0)
        return CreateRanking(source, EqualityComparer<T>.Default);

    var ordered = SortSequence(source, allSorters);
    return CreateRanking(ordered, new SortersEqualityComparer<T>(sortAndRankArray));
}

private static IOrderedEnumerable<T> SortSequence<T>(IEnumerable<T> source, SortDefinition<T>[] sorters)
{
    var ordered = sorters[0].IsDescending
                    ? source.OrderByDescending(sorters[0].Selector)
                    : source.OrderBy(sorters[0].Selector);

    for (var i = 1; i < sorters.Length; ++i)
    {
        var sorter = sorters[i];
        ordered = sorters[i].IsDescending
            ? ordered.ThenByDescending(sorter.Selector)
            : ordered.ThenBy(sorter.Selector);
    }

    return ordered;
}

And the last part are SortersEqualityComparer and CreateRanking method, both of it are private part of the implementation. CreateRanking needs to enumerate the sorted collection, but thanks to yield keyword we still can have deferred execution, so that we're calculating the rank only when we actually want to read the data and only for as many elements as we requested. Here's the code:

private static IEnumerable<Ranked<T>> CreateRanking<T>(
    IEnumerable<T> ordered, IEqualityComparer<T> comparer)
{
    using (var it = ordered.GetEnumerator())
    {
        if (!it.MoveNext())
            yield break;

        // return first value with rank 1
        var current = it.Current;
        var ranked = new Ranked<T>(current, 1);
        yield return ranked;

        var previous = ranked;
        var nextInOrder = 2;

        while (it.MoveNext())
        {
            // and all the other values with either previous value for equal items or next rank in order
            current = it.Current;
            ranked = new Ranked<T>(current, comparer.Equals(previous.Item, current)
                                                          ? previous.Rank 
                                                          : nextInOrder);
            yield return ranked;

            previous = ranked;
            ++nextInOrder;
        }
    }
}

private class SortersEqualityComparer<T> : EqualityComparer<T>
{
    private readonly SortDefinition<T>[] _sorters;

    public SortersEqualityComparer(IEnumerable<SortDefinition<T>> sorters)
    {
        _sorters = sorters.ToArray();
    }

    public override bool Equals(T x, T y)
    {
        return _sorters.All(s => s.Selector(x).Equals(s.Selector(y)));
    }

    public override int GetHashCode(T obj)
    {
        return obj == null ? 0 : obj.GetHashCode();
    }
}

Here is the usage example:

var results = new[]
{
    new TeamResult() { Name = "Enumerable Rangers", Points = 4, GoalsLost = 4 },
    new TeamResult() { Name = "F.C. Async", Points = 4, GoalsLost = 2 },
    new TeamResult() { Name = "Generic Pitfalls", Points = 0, GoalsLost = 7 },
    new TeamResult() { Name = "Linq United", Points = 6, GoalsLost = 2 }
};

var rankAndSort = new[] { new SortDefinition<TeamResult>(x => x.Points, isDescending: true) };
var sortOnly = new[] { new SortDefinition<TeamResult>(x => x.GoalsLost) };

var orderedTeams = results.OrderAndRankBy(rankAndSort, sortOnly);

foreach (var team in orderedTeams)
{
    Console.WriteLine("{0}. {1} - {2} points, {3} goals lost", 
        team.Rank, team.Item.Name, team.Item.Points, team.Item.GoalsLost);
}

And on the console we can see:

1. Linq United - 6 points, 2 goals lost
2. F.C. Async - 4 points, 2 goals lost
2. Enumerable Rangers - 4 points, 4 goals lost
4. Generic Pitfalls - 0 points, 7 goals lost

Full code is available as a Gist - feel free to use it!