Monday, April 15, 2013

NHibernate's LINQ GroupBy capabilities

Recently in the project that is using NHibernate 3.2, I needed to use some aggregations in my database queries. The use case was pretty typical - aggregate some pre-filtered set of invoices by the product sold, count how many sales were there for each product, order the data by total sales value and take top 10 results. It is pretty easy to accomplish in SQL:

SELECT TOP 10 Product, COUNT(*) AS SaleCount, SUM(Value) AS TotalValue
     FROM Invoices
     WHERE Cancelled = 0
     GROUP BY Product
     ORDER BY TotalValue DESC

It is also pretty easy to express in LINQ syntax:

Invoices.Where(i => i.Cancelled == false)
     .GroupBy(i => i.Product)
     .Select(g => new TopSellingProduct 
                  {
                      Product = g.Key, 
                      SaleCount = g.Count(), 
                      TotalValue = g.Sum(i => i.Value) 
                  })
     .OrderByDescending(g => g.TotalValue)
     .Take(10);

I knew that NHibernate's LINQ provider offers limited support for GroupBy operator. Taking into consideration that all the lambda expressions in the query are in fact expression trees that need to be parsed and expressed in SQL, what I expected to be the most problematic, was the Select clause that creates new TopSellingProduct instances (which is not a NHibernate-managed entity) and sets its properties, in case of Sum even using nested lambdas. Actually, this was not a problem at all, even when using anonymous types inside - impressive! NHibernate somehow gets the list of fields and aggregation functions that needs to be fetched and turns it into SELECT clause correctly.

But the query above couldn't be translated into SQL anyway. It turned out that the operators that seemed easier to implement - OrderBy, Take and Skip - were not supported. So with NHibernate 3.2, I could only create an aggregation and fetch all the aggregated values at once, without ordering or paging. In my case, it could mean fetching 50k rows just to show top 10. Not an option.

Fortunately, quick search through the NHibernate's JIRA dashboard gave me the hope that things look better with the newer NHibernate version - 3.3.1. I've upgraded seamlessly using NuGet, and here is the summary of my observations:

SQL featureLINQ syntax exampleNHibernate 3.2NHibernate 3.3.1
SELECT of simple aggregated value; COUNT() function.GroupBy(x => ...).Select(g => g.Count())OKOK
SELECT of anonymous class.GroupBy(x => ...)
.Select(g => new { g.Key, Count = g.Count() })
OKOK
SELECT of named class
.GroupBy(x => ...)
  .Select(g => new MyType 
               {
                 Key = g.Key, 
                 Count = g.Count()
               })
OKOK
SUM(), MIN(), MAX() functions
.GroupBy(x => ...)
  .Select(g => new 
               { 
                 Sum = g.Sum(x => ...), 
                 Min = g.Min(x => ...), 
                 Max = g.Max(x => ...)
               })
OKOK
AVG() function.GroupBy(x => ...).Select(g => g.Avg(x => ...))buggy, truncates value to int (NH-2429)OK
WHERE (condition applied before aggregation).Where(x => ...).GroupBy(x => ...)OKOK
HAVING (condition applied after aggregation).GroupBy(x => ...).Where(g => ...)silent failure, produces subquery instead of HAVING clause and returns wrong results (NH-2883)OK
ORDER BY (sorting).GroupBy(x => ...).OrderBy(g => ...)MismatchTreeNodeException (NH-2781)OK
TOP / LIMIT (number of results).GroupBy(x => ...).Take(10)NotImplementedExceptionOK
OFFSET (paging support).GroupBy(x => ...).Skip(10)NotImplementedExceptionOK

Things look MUCH better now - everything what I need (and a bit more) is correctly supported with the newest LINQ provider.

Monday, April 8, 2013

NHibernate Equals implementation with proxies vs. ReSharper - or yet another couple of hours lost

I'm already quite sensitive to missing Equals and GetHashCode implementations for NHibernate entities or value types that caused issues like re-inserting or duplicating items within collections hundred times.

A good rule for Equals and GetHashCode (that is clearly stated in the documentation) is to make it do comparisons based on the set of fields that create "business" (real-life) identification of an object whenever possible - and not on database-level identifiers. It works well also when comparing objects from different sessions (detached) or not yet persisted (transient) - without any "unproxying" magic.

Today, my personal counter of hours devoted into cursing and fighting NHibernate-related corner cases or strange issues increased once again. I have an interesting (two hours later - frustrating) case of entities being mixed up in spite of correct SQL queries issued. And I was quite confident that my ReSharper-generated Equals and GetHashCode methods were correct and the root cause was somewhere else. Just look how simple was the code of my entity:

public class City
{
    public virtual int Id { get; set; }
    public virtual string Name { get; set; }

    public virtual bool Equals(City other)
    {
        if (ReferenceEquals(null , other))
            return false ;
        if (ReferenceEquals(this , other))
            return true ;
        return Equals(other.Name, Name);
    }

    public override bool Equals(object obj)
    {
        if (ReferenceEquals(null , obj))
            return false ;
        if (ReferenceEquals(this , obj))
            return true ;
        if (obj.GetType() != typeof (City))
            return false ;
        return Equals((City ) obj);
    }

    public override int GetHashCode()
    {
        return (Name != null ? Name.GetHashCode() : 0);
    }
}

I am comparing City instances using a natural key - its name. In the faulty code I was querying the database for different entities that reference City and later the referenced cities were compared for equality. Here is the simplified sketch of the test case (written in Machine.Specifications) with initialization that creates the object graph and two fetching scenarios as a separate tests.

public class EqualsTest : DatabaseTests
{
    Establish context = () => sut.WithinSessionAndTransaction(sess =>
    {
        var city = new City() { Name = "Llanfairpwllgwyngyll" };
        sess.Persist(city);
        sess.Persist( new Address () { City = city });
        sess.Persist( new District () { City = city });
    });

    It should_have_equal_cities = () => sut.WithinSessionAndTransaction(sess =>
    {
        var address = sess.Query<Address>().Single();
        var district = sess.Query<District >().Single();

        district.City.ShouldEqual(address.City);
    });

    It should_correctly_use_city_in_lookup = () => sut.WithinSessionAndTransaction(sess =>
    {
        var address = sess.Query<Address >().Single();
        var districts = sess.Query<District >().ToLookup(x => x.City);

        districts[address.City].ShouldNotBeEmpty();
    });
}

In the first test I'm doing the direct comparison of cities, in the second one I'm creating a lookup table with City instance as a key. In both cases lazy loading takes place so I'm working with NHibernate-generated City proxies. But it should not be a problem, as NHibernate guarantees object identity within a single session, right?

Well, uhm, the first test passes as expected, but the second test fails! It turned out that the City instance used as the key in districts (proxy instance) does not maintain identity with the proxy instance fetched for Address instance, even if they are both pointing at the same (single) city and are used within single session!

I'm not sure why this happens and I'm pretty confident it shouldn't, but fortunately the workaround is quite easy. As the proxies instances used in Address and District instances are now different references, they are compared using the Equals method we've provided. When Equals (or, more precisely, GetHashCode) is called on one of the objects to compare it with the second one, lazy fetch from the database is performed and it becomes the "real", unproxied object. But the second one doesn't - it is still CityProxy instance. And Equals offered by ReSharper, when checking objects types, unfortunately expects the type to exactly match:

        if (obj.GetType() != typeof (City))
            return false ;

But obj.GetType() is a CityProxy and we're exiting the comparison with the negative result here. The workaround is just to replace that exact check with more semantic one, checking only whether obj can be treated as a City instance:

        if (!(obj is City))
            return false ;

In this case, nor City neither CityProxy are eliminated, NHibernate can continue to compare city names and see that the proxy points to the same objects. This simple change done - and voilĂ  - we have two tests passed!