There are a lot of scenarios when we want to have the data sequence ordered according to some multi-level ordering rules that solve ties, but in the same time we do want to have the ties visible through the ordinal, ranking position. Let's take a sports league table as an example.
Rank | Team | Points | Goals lost |
---|---|---|---|
1 | Linq United | 6 | 2 |
2 | F.C. Async | 4 | 2 |
Enumerable Rangers | 4 | 4 | |
4 | Generics Pitfalls | 0 | 7 |
In this example the teams are ordered by the highest number of points and in case of ties, from the lowest lost goals count. But the ranking cares only about the first criterion and in case of tie it places all the tied teams on the same position but later restarts numbering as if there were no ties - the ranking doesn't need to grow monotonically.
Recently I was to implement such a ranking and sorting feature with far more complicated, dynamic rules and I found the task far more complex than I initially thought, especially when taking performance into consideration. I wanted to avoid iterating the collection many times, but in the same time I wanted to keep things simple and rely on LINQ's ordering implementation which is very convenient to use even in multi-level sorting (OrderBy > ThenBy > ThenBy).
Before I mention anything about how LINQ sorting works - let me point an important resource in the topic - it's Jon Skeet's Edulinq series in which he reimplements LINQ on his own, with incredibly great insight. Read it all if you haven't already - it's definitely one of the most useful .NET blog posts series ever.
So, LINQ's OrderBy is backed by a pretty standard sort mechanics that uses our criteria to create the single comparer object used through the whole sort algorithm's work. However, my ranking rules are defined as a subset of ordering rules, so the comparer I'd like to use for ranking is most likely different than the sorting one. Given that I'm not going go reimplement sorting on my own, I'll need to iterate the data twice (but the second run will be on the sorted sequence - no issues with non-deterministic sequences - the original sequence will be run once only).
The implementation goes as follows: In the first iteration, just run the standard sorting. In the second run then, prepare the comparer according to the ranking rules, assign the rank 1 to the first element and then compare each pair of adjacent elements from the sequence, deciding whether the second one should get the same rank as the first one or just the next ordinal number (loop counter).
Let's define generic structures that will be used as input and output of our extension method on IEnumerable<T>. SortDefinition<T> is a class that holds the sorting field selector (the same as passed to the "normal" OrderBy/OrderByDescending methods) and a flag that decides about the sort order (I needed to have the ability to define complex and dynamic sorting criteria, it's much easier done with a flag than using LINQ's approach with different methods for ascending and descending sort). There is also Ranked<T> wrapper for our sequence elements type that carries the item together with it's ranking information:
public class Ranked<T> { public Ranked(T item, int rank) { Item = item; Rank = rank; } public T Item { get; private set; } public int Rank { get; private set; } } public class SortDefinition<T> { public SortDefinition(Func<T, object> selector, bool isDescending = false) { Selector = selector; IsDescending = isDescending; } public Func<T, object> Selector { get; private set; } public bool IsDescending { get; private set; } }
Next, here is the public extension method and private method that encapsulates sorting itself. It's using only plain LINQ operators, so we have deferred execution here for free.
public static IEnumerable<Ranked>T>> OrderAndRankBy<T>( this IEnumerable<T> source, IEnumerable<SortDefinition<T>> sortAndRank, IEnumerable<SortDefinition<T>> sortOnly) { if (source == null) throw new ArgumentNullException("source"); var sortAndRankArray = sortAndRank.ToArray(); var allSorters = sortAndRankArray.Concat(sortOnly).ToArray(); if (allSorters.Length == 0) return CreateRanking(source, EqualityComparer<T>.Default); var ordered = SortSequence(source, allSorters); return CreateRanking(ordered, new SortersEqualityComparer<T>(sortAndRankArray)); } private static IOrderedEnumerable<T> SortSequence<T>(IEnumerable<T> source, SortDefinition<T>[] sorters) { var ordered = sorters[0].IsDescending ? source.OrderByDescending(sorters[0].Selector) : source.OrderBy(sorters[0].Selector); for (var i = 1; i < sorters.Length; ++i) { var sorter = sorters[i]; ordered = sorters[i].IsDescending ? ordered.ThenByDescending(sorter.Selector) : ordered.ThenBy(sorter.Selector); } return ordered; }
And the last part are SortersEqualityComparer and CreateRanking method, both of it are private part of the implementation. CreateRanking needs to enumerate the sorted collection, but thanks to yield keyword we still can have deferred execution, so that we're calculating the rank only when we actually want to read the data and only for as many elements as we requested. Here's the code:
private static IEnumerable<Ranked<T>> CreateRanking<T>( IEnumerable<T> ordered, IEqualityComparer<T> comparer) { using (var it = ordered.GetEnumerator()) { if (!it.MoveNext()) yield break; // return first value with rank 1 var current = it.Current; var ranked = new Ranked<T>(current, 1); yield return ranked; var previous = ranked; var nextInOrder = 2; while (it.MoveNext()) { // and all the other values with either previous value for equal items or next rank in order current = it.Current; ranked = new Ranked<T>(current, comparer.Equals(previous.Item, current) ? previous.Rank : nextInOrder); yield return ranked; previous = ranked; ++nextInOrder; } } } private class SortersEqualityComparer<T> : EqualityComparer<T> { private readonly SortDefinition<T>[] _sorters; public SortersEqualityComparer(IEnumerable<SortDefinition<T>> sorters) { _sorters = sorters.ToArray(); } public override bool Equals(T x, T y) { return _sorters.All(s => s.Selector(x).Equals(s.Selector(y))); } public override int GetHashCode(T obj) { return obj == null ? 0 : obj.GetHashCode(); } }
Here is the usage example:
var results = new[] { new TeamResult() { Name = "Enumerable Rangers", Points = 4, GoalsLost = 4 }, new TeamResult() { Name = "F.C. Async", Points = 4, GoalsLost = 2 }, new TeamResult() { Name = "Generic Pitfalls", Points = 0, GoalsLost = 7 }, new TeamResult() { Name = "Linq United", Points = 6, GoalsLost = 2 } }; var rankAndSort = new[] { new SortDefinition<TeamResult>(x => x.Points, isDescending: true) }; var sortOnly = new[] { new SortDefinition<TeamResult>(x => x.GoalsLost) }; var orderedTeams = results.OrderAndRankBy(rankAndSort, sortOnly); foreach (var team in orderedTeams) { Console.WriteLine("{0}. {1} - {2} points, {3} goals lost", team.Rank, team.Item.Name, team.Item.Points, team.Item.GoalsLost); }
And on the console we can see:
1. Linq United - 6 points, 2 goals lost 2. F.C. Async - 4 points, 2 goals lost 2. Enumerable Rangers - 4 points, 4 goals lost 4. Generic Pitfalls - 0 points, 7 goals lost
Full code is available as a Gist - feel free to use it!
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.