NOtherDev: NHibernate queries

Showing posts with label NHibernate queries. Show all posts

Friday, December 9, 2011

Fetching whole table with NHibernate

I've done a quick benchmark for different ways of retrieving whole tables from the database. I've generated 10,000 random rows in my test table with countries first and then run simple SELECT * FROM Country queries for 7 different ways using more or less of NHibernate.

Here are the methods and its results:

2263 ms for LINQ query:

session.Query<Country>().ToList();

1468 ms for ICriteria:

session.CreateCriteria<Country>().List<Country>();

1289 ms for QueryOver:

session.QueryOver<Country>().List();

1431 ms for HQL:

session.CreateQuery("From Country").List<Country>();

1428 ms for SQL query with AddEntity:

session.CreateSQLQuery("select Id, Name, Continent from Country")
    .AddEntity(typeof(Country))
    .List<Country>();

593 ms for SQL query with AliasToBeanTransformer:

session.CreateSQLQuery("select Id, Name, Continent from Country")
    .AddScalar("Id", NHibernateUtil.Int32)
    .AddScalar("Name", NHibernateUtil.String)
    .AddScalar("Continent", NHibernateUtil.String)
    .SetResultTransformer(Transformers.AliasToBean<Country>())
    .List<Country>();

489 ms for SQL query with manual objects creation:

session.CreateSQLQuery("select Id, Name, Continent from Country")
    .AddScalar("Id", NHibernateUtil.Int32)
    .AddScalar("Name", NHibernateUtil.String)
    .AddScalar("Continent", NHibernateUtil.String)
    .List<object[]>()
    .Select(row => new Country() 
    { 
        Id = (int) row[0], 
        Name = (string) row[1], 
        Continent = (string) row[2] 
    }).ToList();

The results are pretty interesting.

Contrary to LINQ provider, there's no significant difference between ICriteria, QueryOver and HQL. Fetching data with native SQL query gives no performance gain when used with AddEntity - NHibernate still needs to initialize its infrastructure for managing entities - result is comparable to previous methods.

But there is a big difference when using plain SQL queries and transforming rows to objects on our own - it is about 3 times faster than letting NHibernate to do it. Why is that? When using AliasToBeanTransformer or when creating objects manually, we in fact don't work with persistent entities. Country objects created in two last queries are just POCOs, not associated with NHibernate session at all - its changes are not tracked and persisted automatically. This is not the usual way we are supposed to work with NHibernate, but this can be useful in some scenarios.

Monday, November 14, 2011

LINQ Pitfalls: Using external variables in projections

Another issue with NHibernate.Linq provider that surprised me a lot is about variables (or constants) that appear in Select expressions. It seems that you can use it, but once only! See this simple code:

var result1 = session.Query<River>()
    .Select(r => new { r.Name, Value = 1 })
    .First();
Console.WriteLine("a) {0} = 1?", result1.Value);

var result2 = session.Query<River>()
    .Select(r => new { r.Name, Value = 2 })
    .First();
Console.WriteLine("b) {0} = 2?", result2.Value);

Note that the queries differ in constant value only. But the difference is pretty obvious. Well, not for NHibernate. This is the actual output:

a) 1 = 1?
b) 1 = 2?

It seems that NHibernate parses expressions only once and caches it internally, ignoring the actual member values. NHibernate will always use cached value 1. This is a silent fail, much worse situation than failing with NotSupportedException or something else.

What is the workaround? Select only data that comes from database and add all other external values in code, outside NHibernate.Linq:

var result = session.Query<River>()
    .Select(r => r.Name)
    .First()
    .Select(r => new { Name = r, Value = 1 });

This is already known bug (see this JIRA entry), marked as critical, but still unresolved. It influences Select only. Values in other expressions, i.e. in Where calls, are not cached and work as you may expect.

Well, I'll maybe try to make a bit more insight into what is supported in NHibernate.Linq and what not...

Sunday, November 6, 2011

LINQ Pitfalls: Order of operators (cont.)

In the previous post I've shown that operators in NHibernate.Linq query have to keep some logical order due to SQL language design. But the rules for NHibernate queries are not as strict as order of clauses in SQL. In fact, there is quite a good compromise between LINQ's flexibility and SQL constraints.

NHibernate generally allows to move operators around as long as the result doesn't depend on the order of operators. For example, there's no logical difference between these two queries:

session.Query<River>()
    .OrderBy(x => x.Length)
    .Where(x => x.Length > 100)
    .ToList();

session.Query<River>()
    .Where(x => x.Length > 100)
    .OrderBy(x => x.Length)
    .ToList();

Both these queries produce correct SQL, even if the first one defines operations in non-SQL-compliant order (WHERE is before ORDER BY in SQL). NHibernate knows that applying Where restriction can be done at any time and it doesn't change the result. That's why it allows to put it almost anywhere in LINQ query. The only rule is to place it before paging operators (Skip and Take), as it does matter if we are paging before or after applying Where conditions.

Similarily with projections - Select operator can be placed in different query parts and it doesn't change the result. Theoretically it could be placed even after paging, but for some reason NHibernate is more SQL-compliant in this case and doesn't allow Take or Skip to precede Select.

So, besides Skip and Take, are there any other operators that need to be placed in specific query part? NHibernate doesn't like OrderBy placed before GroupBy, but this is rather bizzare combination and we may ignore it. All the other supported operators seems to be pretty portable - at least I haven't found any other problems when putting operators in non-SQL-compliant order. Well, NHibernate's Linq provider has to be extremally complex, it is not a surprise that it took 3 man-years to implement it.

How about non-standard, NHibernate-specific operators, such as Cacheable, CacheMode, ToFuture, Fetch, FetchMany? In NHibernate 3.2 it can move around, too. However, in NH 3.0 Fetch and FetchMany had to be the last operators, just before ToList or other method that triggers the database call. Otherwise, NotSupportedException was thrown.

Saturday, November 5, 2011

LINQ Pitfalls: Order of operators

When writing SQL queries you need to put all the clauses in particular order that is enforced by the SQL language itself and somehow relates with how the RDBMS is going to process the query. In general, the order in which the clauses are processed in databases is:

FROM clause
WHERE clause
GROUP BY clause
HAVING clause
SELECT clause
ORDER BY clause
LIMIT/OFFSET clauses

You can't (easily and without subqueries) i.e. apply WHERE conditions after sorting or apply ordering after applying LIMIT (TOP n) clause. This is pretty explicit from SQL language syntax and makes sense for performance reasons.

When writing NHibernate's LINQ queries, the rules generally still apply. We need to always remember that eventually NHibernate have to produce plain old SQL query that is syntactically correct and makes sense. This means that even if it is perfectly legal in LINQ to Objects to write a query with an arbitrary, not SQL-compliant order of operations, NHibernate.Linq doesn't allow it (with NotSupportedException) as it is hard to translate such query into SQL.

It may seem that NHibernate could optimize and reorder operations in the resulting query itself, as calling NHibernate.Linq operators is only composing the query, not running it step by step. Well, technically it could, but let's see what would this mean.

Let's assume that we have the following rivers in the database:

Oder - 854 km
Vistula - 1047 km
Elbe - 1165 km
Danube - 2845 km

Relying on NHibernate's hypothetical ability to apply the operators in SQL-compliant way, we could try to get two longest rivers from the database using any of these two queries:

Session.Query<River>().Take(2).OrderByDescending(x => x.Length);
Session.Query<River>().OrderByDescending(x => x.Length).Take(2);

But let's see the results of these queries when run against in-memory objects collection using plain LINQ to Objects:

rivers.Take(2).OrderByDescending(x => x.Length);

Vistula - 1047 km
Oder - 854 km

rivers.OrderByDescending(x => x.Length).Take(2);

Danube - 2845 km
Elbe - 1165 km

So order does matter and it has to matter in every LINQ implementation. That's why it makes perfect sense for NHibernate to throw NotSupportedException when the ordering is not SQL-compliant.

If you really intend to apply ordering after two first objects are fetched, you should fetch it unordered from the database and order it in-memory, falling back to plain LINQ to Objects. LINQ's AsEnumerable() is designed especially for cases like this - it switches the context from database to plain objects.

Session.Query<River>()
     .Take(2)
     .AsEnumerable()
     .OrderByDescending(x => x.Length);

In the next post, we'll see are there any exceptions from the rule above and what are the order rules for non-standard NHibernate.Linq operators.

Sunday, October 30, 2011

LINQ Pitfalls: Casting

One thing that needs to be always remembered when writing NHibernate.Linq queries is that it is going to be translated into SQL eventually. What this means is that we can't do everything in our Select or Where conditions - we are restricted by the capabilities of underlying database and SQL language itself.

Let's say our rivers have delta type stored in the database as plain string and in our domain model we introduced an enum type for convenience:

public class River
{
    // ...
    public virtual string DeltaType { get; set; }
}

public enum DeltaType
{
    None,
    WaveDominated,
    TideDominated,
    Gilbert,
    Estuary
}

Now we're going to issue a query that should return all the rivers that has one of the delta types given:

repository.FetchByDeltaTypes(new[] { DeltaType.Estuary, DeltaType.Gilbert });

The first try can look like this:

public IEnumerable<River> FetchByDeltaTypes(IEnumerable<DeltaType> types)
{
    IEnumerable<River> data;
    using (var tx = _session.BeginTransaction())
    {
        data = _session.Query<River>()
            .Where(r => types.Contains(
                (DeltaType)Enum.Parse(typeof(DeltaType), r.DeltaType)
            )).ToList();

        tx.Commit();
    }

    return data;
}

Here I'm trying to convert string from entity into an enum value for comparison. When tried to run it, I've got NotSupportedException with quite reticent, but pretty clear message "System.Object Parse(System.Type, System.String)". NHibernate complains here that it can't translate Parse method call into corresponding SQL statement. Well, I can't think of that translation either - SQL doesn't know our enum types etc.

We have to think of another solution - we have to provide the database with strings that it can understand. What we can do here is to change all the enum values into strings first and then pass it to the query:

public IEnumerable<River> FetchByDeltaTypes(IEnumerable<DeltaType> types)
{
    IEnumerable<River> data;
    using (var tx = _session.BeginTransaction())
    {
        var typesStrings = types.Select(x => x.ToString()).ToList();

        data = _session.Query<River>()
            .Where(r => typesStrings.Contains(r.DeltaType))
            .ToList();

        tx.Commit();
    }

    return data;
}

No big difference for our code, big difference for NHibernate. It is now trivial to translate this Where clause into SQL. Conclusion - it is always important to think how the query is going to be translated into SQL.

Tuesday, October 25, 2011

LINQ Pitfalls: Nested queries

NHibernate's LINQ provider is very easy to start with and sufficient for majority of use cases, but when used in more complex scenarios it can mislead and surprise. Here is one of the pitfalls (or perhaps bugs?) I've recently run into. Note that I'm using NHibernate 3.0 and the issue is already fixed, but general conclusions possibly still apply.

I have simple model with two-step relationship: River has many Countries, Country has one Continent. I want to query for all rivers located within given continent. For code simplicity, let's try to use two separate LINQ queries:

var countries = session.Query<Country>()
    .Where(c => c.Continent.Id == continentId);

var data = session.Query<River>()
    .Where(r => (r.Countries.Where(c => countries.Contains(c))).Any())
    .ToList();

The Where condition in second query is satisfied if at least one of river's countries exists in the country list for given continent. I don't call ToList() on first query, so I assume it is not run separately but merged (nested) into second one (or at least NotSupportedException is thrown or something). Here is the query that actually runs:

What do we have here? It gets all rivers, filter it for rivers that are within countries (country2_ alias), that are equal to ANY country (country3_ alias) - no continent condition at all! The query is merged, it runs without errors and silently produces unexpected, wrong results.

How can we enforce LINQ provider to filter the results correctly? The easiest way is to materialize the list of countries in separate query to avoid buggy query merging. So let's add ToList() to the first query:

var countries = session.Query<Country>()
    .Where(c => c.Continent.Id == continentId)
    .ToList();

var data = session.Query<River>()
    .Where(r => (r.Countries.Where(c => countries.Contains(c))).Any())
    .ToList();

This is how these queries look like in SQL:

This time we are fetching the list of countries first and the second query is correctly filtered using IN clause. But this is not the best solution, for sure. There are two queries instead of one and we don't really need country values, especially if there may be thousands of rows in real-life model. In this case, we should try to rewrite the queries to express our filtering logic in single query. In River-Country-Continent example, it will look like this:

data = session.Query<River>()
    .Where(r => r.Countries.Any(c => c.Continent.Id == continentId))
    .ToList();

This time the SQL statement contains continent filter and results are as expected:

Conclusion: Use the latest stable versions of your software. Be careful when merging LINQ queries in NHibernate. Ensure all ToList's are in place or find different way to express your filtering. And always check what SQL is your query producing!