Sunday, November 27, 2011

Many-to-many mapping: collection types

In the previous post we've seen how to instruct NHibernate to generate proper number of INSERT statements to intermediate many-to-many table using Inverse attribute. Time for something bit more complicated.

In the previous example we've created fresh instances of Users and Group and used it in the same session, so NHibernate knew the collections were initially empty. Let's see what happens when we want to work with the collections that are fetched from the database with lazy loading, so that NHibernate doesn't know the collections.

In the example below we're just trying to remove second User from the Group - a task that could be accomplished with single DELETE statement.

using (var sess = factory.OpenSession())
using (var tx = sess.BeginTransaction())
var group = sess.Get<Group>(1);
var user2 = sess.Load<User>(2);


And here are the queries run in this session:

Well, we just wanted to delete single row from GroupsToUsers, but NHibernate decided to load the collection of users assigned to our Group with ID=1, purge the whole collection, remove the entry in memory and re-add rows that left one by one (which may be thousands of rows!). Seems a bit redundant, doesn't it?

To understand what happened here, first we need to know the difference between the collection types supported by NHibernate and which are used by default by Fluent NHibernate if we didn't specify it explicitly.

There are two collection types that are important for many-to-many mappings - bags and sets:

  • bag is the simplest container - it just holds any items without uniqueness checking
  • set is unique - it means that it can't have two items with exactly the same values

In my object model, I've defined the collections properties using ICollection<T> interface. It seems to be the good choice as it's the base interface for all collections. But ICollection doesn't give any hint to Fluent NHibernate which type of collection to use and Fluent NHibernate chooses the simplest one, which is bag.

I think Fluent NHibernate is wrong with hiding the decision which collection type to use somewhere in conventions. This changes a lot in how NHibernate behaves and the choice should be explicit at mapping level, so that we'll need to think about it instead of letting Fluent to choose something what could be highly inappropriate.

OK, so now we know that our Users collection in Group is implicitly defined as bag. Why it recreates the whole collection just to remove one row? This is just how bag works. Bag doesn't have a primary key and NHibernate can't construct SQL query that addresses the single row to delete - WHERE User_id = 2 AND Group_id = 1 is ambiguous when the row is duplicated (and it can be, as we saw in the previous post). NHibernate's strategy to solve that problem, which we saw above, is obviously the simplest one, but at least it makes us think and look for another solution.

In case of many-to-many, in almost all cases, we in fact need set semantics. Duplicate rows in intermediate table have no meaning and should be forbidden at database level using primary key constraint.

So let's map our Users collection in Group classmap as set:

// in UserMap
HasManyToMany(x => x.Groups).Inverse();

// in GroupMap
HasManyToMany(x => x.Users).AsSet();

Note that we can leave the collection at User side with default bag mapping because it is marked as Inverse, so there is no database write triggered from there and read efficiency is the same with bag and set (it's just a SELECT to fetch all data by foreign key).

Let's look at the queries from our session this time:

Much better. There is no DELETE statement for whole collection and there is no re-inserting. There's just one DELETE that will affect one row only (this is guaranteed by set semantics).

Why do we need to load the collection anyway (statement #2)? I'll explain it in the next post, together with summary and conclusions for many-to-many mappings.

Saturday, November 26, 2011

Many-to-many mapping: Inverse

In this and few next posts I'll try to go over the rules and guidelines and give some insights into how to map many-to-many relationship in NHibernate properly.

In general, it is good for relationships in object model to follow the relationships at database level, so that querying for objects can be naturally translated into SQL queries without any additional overhead.

This is quite easy and natural for parent-child relationship (aka many-to-one). Child object can have a reference to its parent object as well as child row can have a foreign key to its parent row.

Many-to-many relationships (i.e. Users and Groups) are different. In fact, they can exist only at object model level - at database level it is only a concept implemented using two many-to-one relations with intermediate table.

We can implement it the same way in our NHibernate-based object model, but the intermediate entity will probably have no additional properties and this will lead to quite polluted and ugly model. We don't want User and Group to have collections of some UserInGroup objects that have no real object-oriented meaning. We do want User to have many Groups and Group to have many Users - as simple as it can be.

Fortunately, this is quite common scenario and NHibernate can support it very well with respect to database-level constraints and good practices, but only when mapped and used with care.

Let's begin with the simplest possible mapping using FluentNHibernate:

// in UserMap
HasManyToMany(x => x.Groups);

// in GroupMap
HasManyToMany(x => x.Users);

Now let's add two users to a group. To ensure our objects state is correct and collections both at User and Group side are complete, we need to update the collections on both sides:

using (var sess = sessionFactory.OpenSession())
using (var tx = sess.BeginTransaction())
var user1 = new User() { Name = "u1" };

var user2 = new User() { Name = "u2" };

var group1 = new Group() { Name = "g" };




But NHibernate is not meant to guess that when adding one relationship on User side in the 13. and 16. line and then adding another one on Group side in the next line we are in fact specifying the same single relationship. What is the result for lines 13-17?

Each statement in the code above triggers separate INSERT statement, duplicating each relationship. At database level it means that each user is a group member twice. Moreover, when GroupsToUsers table has a primary key defined on both columns (and it probably should have), the database will complain about primary key violation and we'll end up with an exception.

What we need to do here is to inform NHibernate that we're defining the same relation on both ends and it should be represented as one row in the database. We need to choose which side is responsible to do the database insert and mark the other with inverse attribute. For example, let's choose User as inverse and make group.Users.Add(user) trigger the database call.

// in UserMap
HasManyToMany(x => x.Groups).Inverse();

// in GroupMap
HasManyToMany(x => x.Users);

(We'll cover how to choose the inverse side properly later.)

What is the result for lines 13-17 now?

So far, so good. But it's far from correctness yet. By now, it works without overhead only for entitles created just before so that NHibernate knows the current collections state. But things looks much worse when the entities are loaded from the database. We'll look closer at the problem and its solution in the next post.

Tuesday, November 22, 2011

ReSharper and external ASP.NET MVC views

The project I currently work in is based on ASP.NET MVC and, beside main Web project, it can be extended using additional plugin-style libraries, that we import from specified directories and register in main application. Those plugin libraries can contain additional view files (.aspx or .ascx) - to support that, we're storing it as embedded resources and we have custom VirtualPathProvider that resolves it in main Web application.

The problem we were struggling with for quite a long time was that ReSharper was unable to resolve those views correctly, getting lost on Control definition in the first line of view. This caused whole file to be not understood, a lot of errors and fix suggestions were showing up and with several views like this, the performance was dramatically low.

Recently I've spent some time to investigate it more carefully and it turned out that the reason is fairly simple. This has nothing to do with embedded resource setting or not standard file locations. The only thing to do was to place standard Web.config file for views in each external project that contains ASP.NET MVC views.

Here are the lines that ensures ReSharper knows what to do with .aspx/.ascx files (for ASP.NET MVC 2):

pageParserFilterType="System.Web.Mvc.ViewTypeParserFilter, System.Web.Mvc, Version=, Culture=neutral, PublicKeyToken=31BF3856AD364E35"
pageBaseType="System.Web.Mvc.ViewPage, System.Web.Mvc, Version=, Culture=neutral, PublicKeyToken=31BF3856AD364E35"
userControlBaseType="System.Web.Mvc.ViewUserControl, System.Web.Mvc, Version=, Culture=neutral, PublicKeyToken=31BF3856AD364E35">
<add assembly="System.Web.Mvc, Version=, Culture=neutral, PublicKeyToken=31BF3856AD364E35"
namespace="System.Web.Mvc" tagPrefix="mvc" />

Thursday, November 17, 2011

Use tools only for what they are made for

I'm using NHibernate.Linq quite intensively for more than a year now and while I do know it better and better, I use it in less scenarios than before. Why is that? It's easy - there are some scenarios, where using it makes no sense. NHibernate's provider is just a tool, with its design, its features and its limitations. NHibernate.Linq is good in querying for business objects, as this is what it's made for. But we may get tired when trying to do some analytical or reporting queries or when processing data in batch mode. This is of course doable with Linq, but for sure not practical. And this is what plain old SQL is made for.

We, developers, often have some kind of unsaid belief that when we decided to use one specific tool, we shouldn't use another, similiar one. That makes sense when both tools offer exactly the same functionality or when using another tool causes significant performance penalty. But this is not the case with NHibernate.Linq and other database access methods. NHibernate is already bundled with both QueryOver and Linq, so we can use both without any penalties. We can also rely on ICriteria API, HQL or plain SQL queries if needed. We can even use ADO connection directly to totally skip NHibernate's layers. If this makes our task easier, why not?

Good cook always have a lot of knives in his kitchen and he knows that one knive is for fish and other for vegetables. There's no such thing as universal knive to cut everything. The same for us - it's important to be aware what each tool is the best at, to know which one to use in particular situation and try not to look for universal tool "to rule them all".

Monday, November 14, 2011

LINQ Pitfalls: Using external variables in projections

Another issue with NHibernate.Linq provider that surprised me a lot is about variables (or constants) that appear in Select expressions. It seems that you can use it, but once only! See this simple code:

var result1 = session.Query<River>()
.Select(r => new { r.Name, Value = 1 })
Console.WriteLine("a) {0} = 1?", result1.Value);

var result2 = session.Query<River>()
.Select(r => new { r.Name, Value = 2 })
Console.WriteLine("b) {0} = 2?", result2.Value);

Note that the queries differ in constant value only. But the difference is pretty obvious. Well, not for NHibernate. This is the actual output:

a) 1 = 1?
b) 1 = 2?

It seems that NHibernate parses expressions only once and caches it internally, ignoring the actual member values. NHibernate will always use cached value 1. This is a silent fail, much worse situation than failing with NotSupportedException or something else.

What is the workaround? Select only data that comes from database and add all other external values in code, outside NHibernate.Linq:

var result = session.Query<River>()
.Select(r => r.Name)
.Select(r => new { Name = r, Value = 1 });

This is already known bug (see this JIRA entry), marked as critical, but still unresolved. It influences Select only. Values in other expressions, i.e. in Where calls, are not cached and work as you may expect.

Well, I'll maybe try to make a bit more insight into what is supported in NHibernate.Linq and what not...

Sunday, November 6, 2011

LINQ Pitfalls: Order of operators (cont.)

In the previous post I've shown that operators in NHibernate.Linq query have to keep some logical order due to SQL language design. But the rules for NHibernate queries are not as strict as order of clauses in SQL. In fact, there is quite a good compromise between LINQ's flexibility and SQL constraints.

NHibernate generally allows to move operators around as long as the result doesn't depend on the order of operators. For example, there's no logical difference between these two queries:

.OrderBy(x => x.Length)
.Where(x => x.Length > 100)

.Where(x => x.Length > 100)
.OrderBy(x => x.Length)

Both these queries produce correct SQL, even if the first one defines operations in non-SQL-compliant order (WHERE is before ORDER BY in SQL). NHibernate knows that applying Where restriction can be done at any time and it doesn't change the result. That's why it allows to put it almost anywhere in LINQ query. The only rule is to place it before paging operators (Skip and Take), as it does matter if we are paging before or after applying Where conditions.

Similarily with projections - Select operator can be placed in different query parts and it doesn't change the result. Theoretically it could be placed even after paging, but for some reason NHibernate is more SQL-compliant in this case and doesn't allow Take or Skip to precede Select.

So, besides Skip and Take, are there any other operators that need to be placed in specific query part? NHibernate doesn't like OrderBy placed before GroupBy, but this is rather bizzare combination and we may ignore it. All the other supported operators seems to be pretty portable - at least I haven't found any other problems when putting operators in non-SQL-compliant order. Well, NHibernate's Linq provider has to be extremally complex, it is not a surprise that it took 3 man-years to implement it.

How about non-standard, NHibernate-specific operators, such as Cacheable, CacheMode, ToFuture, Fetch, FetchMany? In NHibernate 3.2 it can move around, too. However, in NH 3.0 Fetch and FetchMany had to be the last operators, just before ToList or other method that triggers the database call. Otherwise, NotSupportedException was thrown.

Saturday, November 5, 2011

LINQ Pitfalls: Order of operators

When writing SQL queries you need to put all the clauses in particular order that is enforced by the SQL language itself and somehow relates with how the RDBMS is going to process the query. In general, the order in which the clauses are processed in databases is:

  1. FROM clause
  2. WHERE clause
  3. GROUP BY clause
  4. HAVING clause
  5. SELECT clause
  6. ORDER BY clause
  7. LIMIT/OFFSET clauses

You can't (easily and without subqueries) i.e. apply WHERE conditions after sorting or apply ordering after applying LIMIT (TOP n) clause. This is pretty explicit from SQL language syntax and makes sense for performance reasons.

When writing NHibernate's LINQ queries, the rules generally still apply. We need to always remember that eventually NHibernate have to produce plain old SQL query that is syntactically correct and makes sense. This means that even if it is perfectly legal in LINQ to Objects to write a query with an arbitrary, not SQL-compliant order of operations, NHibernate.Linq doesn't allow it (with NotSupportedException) as it is hard to translate such query into SQL.

It may seem that NHibernate could optimize and reorder operations in the resulting query itself, as calling NHibernate.Linq operators is only composing the query, not running it step by step. Well, technically it could, but let's see what would this mean.

Let's assume that we have the following rivers in the database:

  • Oder - 854 km
  • Vistula - 1047 km
  • Elbe - 1165 km
  • Danube - 2845 km

Relying on NHibernate's hypothetical ability to apply the operators in SQL-compliant way, we could try to get two longest rivers from the database using any of these two queries:

Session.Query<River>().Take(2).OrderByDescending(x => x.Length);
Session.Query<River>().OrderByDescending(x => x.Length).Take(2);

But let's see the results of these queries when run against in-memory objects collection using plain LINQ to Objects:

rivers.Take(2).OrderByDescending(x => x.Length);
  1. Vistula - 1047 km
  2. Oder - 854 km
rivers.OrderByDescending(x => x.Length).Take(2);
  1. Danube - 2845 km
  2. Elbe - 1165 km

So order does matter and it has to matter in every LINQ implementation. That's why it makes perfect sense for NHibernate to throw NotSupportedException when the ordering is not SQL-compliant.

If you really intend to apply ordering after two first objects are fetched, you should fetch it unordered from the database and order it in-memory, falling back to plain LINQ to Objects. LINQ's AsEnumerable() is designed especially for cases like this - it switches the context from database to plain objects.

.OrderByDescending(x => x.Length);

In the next post, we'll see are there any exceptions from the rule above and what are the order rules for non-standard NHibernate.Linq operators.