Monday, April 15, 2013

NHibernate's LINQ GroupBy capabilities

Recently in the project that is using NHibernate 3.2, I needed to use some aggregations in my database queries. The use case was pretty typical - aggregate some pre-filtered set of invoices by the product sold, count how many sales were there for each product, order the data by total sales value and take top 10 results. It is pretty easy to accomplish in SQL:

SELECT TOP 10 Product, COUNT(*) AS SaleCount, SUM(Value) AS TotalValue
     FROM Invoices
     WHERE Cancelled = 0
     GROUP BY Product
     ORDER BY TotalValue DESC

It is also pretty easy to express in LINQ syntax:

Invoices.Where(i => i.Cancelled == false)
     .GroupBy(i => i.Product)
     .Select(g => new TopSellingProduct 
                  {
                      Product = g.Key, 
                      SaleCount = g.Count(), 
                      TotalValue = g.Sum(i => i.Value) 
                  })
     .OrderByDescending(g => g.TotalValue)
     .Take(10);

I knew that NHibernate's LINQ provider offers limited support for GroupBy operator. Taking into consideration that all the lambda expressions in the query are in fact expression trees that need to be parsed and expressed in SQL, what I expected to be the most problematic, was the Select clause that creates new TopSellingProduct instances (which is not a NHibernate-managed entity) and sets its properties, in case of Sum even using nested lambdas. Actually, this was not a problem at all, even when using anonymous types inside - impressive! NHibernate somehow gets the list of fields and aggregation functions that needs to be fetched and turns it into SELECT clause correctly.

But the query above couldn't be translated into SQL anyway. It turned out that the operators that seemed easier to implement - OrderBy, Take and Skip - were not supported. So with NHibernate 3.2, I could only create an aggregation and fetch all the aggregated values at once, without ordering or paging. In my case, it could mean fetching 50k rows just to show top 10. Not an option.

Fortunately, quick search through the NHibernate's JIRA dashboard gave me the hope that things look better with the newer NHibernate version - 3.3.1. I've upgraded seamlessly using NuGet, and here is the summary of my observations:

SQL featureLINQ syntax exampleNHibernate 3.2NHibernate 3.3.1
SELECT of simple aggregated value; COUNT() function.GroupBy(x => ...).Select(g => g.Count())OKOK
SELECT of anonymous class.GroupBy(x => ...)
.Select(g => new { g.Key, Count = g.Count() })
OKOK
SELECT of named class
.GroupBy(x => ...)
  .Select(g => new MyType 
               {
                 Key = g.Key, 
                 Count = g.Count()
               })
OKOK
SUM(), MIN(), MAX() functions
.GroupBy(x => ...)
  .Select(g => new 
               { 
                 Sum = g.Sum(x => ...), 
                 Min = g.Min(x => ...), 
                 Max = g.Max(x => ...)
               })
OKOK
AVG() function.GroupBy(x => ...).Select(g => g.Avg(x => ...))buggy, truncates value to int (NH-2429)OK
WHERE (condition applied before aggregation).Where(x => ...).GroupBy(x => ...)OKOK
HAVING (condition applied after aggregation).GroupBy(x => ...).Where(g => ...)silent failure, produces subquery instead of HAVING clause and returns wrong results (NH-2883)OK
ORDER BY (sorting).GroupBy(x => ...).OrderBy(g => ...)MismatchTreeNodeException (NH-2781)OK
TOP / LIMIT (number of results).GroupBy(x => ...).Take(10)NotImplementedExceptionOK
OFFSET (paging support).GroupBy(x => ...).Skip(10)NotImplementedExceptionOK

Things look MUCH better now - everything what I need (and a bit more) is correctly supported with the newest LINQ provider.

4 comments:

  1. Does grouping by the whole entity like the way you have in your sample query ".GroupBy(i => i.Product)" work properly?

    I have the problem mentioned here (https://groups.google.com/forum/?hl=en#!searchin/nhusers/nhibernate$20linq$20group$20by$20entity/nhusers/jbQP9E6_Sb8/yIg3-vIuE_QJ).

    I was just wondering how you got it work. When I try to group by an entity, the translated sql does not include all the columns of the entity (only includes the identity column) and basically generates a wrong sql.



    ReplyDelete
    Replies
    1. Well, I don't think my Product was an entity, probably string or something like that. Doesn't GroupBy(i => i.Product.Id) work?

      Delete
  2. I see now. Yours is not an entity. GroupBy(i => i.Product.Id) would work, but that's not what I want. I want to group by all the properties of my entity to be able to select my entity in the select clause.

    ReplyDelete
  3. I would not expect NHibernste to work like that. Two queries is the only solution I can see.

    ReplyDelete