Sunday, November 27, 2011

Many-to-many mapping: collection types

In the previous post we've seen how to instruct NHibernate to generate proper number of INSERT statements to intermediate many-to-many table using Inverse attribute. Time for something bit more complicated.

In the previous example we've created fresh instances of Users and Group and used it in the same session, so NHibernate knew the collections were initially empty. Let's see what happens when we want to work with the collections that are fetched from the database with lazy loading, so that NHibernate doesn't know the collections.

In the example below we're just trying to remove second User from the Group - a task that could be accomplished with single DELETE statement.

using (var sess = factory.OpenSession())
using (var tx = sess.BeginTransaction())
var group = sess.Get<Group>(1);
var user2 = sess.Load<User>(2);


And here are the queries run in this session:

Well, we just wanted to delete single row from GroupsToUsers, but NHibernate decided to load the collection of users assigned to our Group with ID=1, purge the whole collection, remove the entry in memory and re-add rows that left one by one (which may be thousands of rows!). Seems a bit redundant, doesn't it?

To understand what happened here, first we need to know the difference between the collection types supported by NHibernate and which are used by default by Fluent NHibernate if we didn't specify it explicitly.

There are two collection types that are important for many-to-many mappings - bags and sets:

  • bag is the simplest container - it just holds any items without uniqueness checking
  • set is unique - it means that it can't have two items with exactly the same values

In my object model, I've defined the collections properties using ICollection<T> interface. It seems to be the good choice as it's the base interface for all collections. But ICollection doesn't give any hint to Fluent NHibernate which type of collection to use and Fluent NHibernate chooses the simplest one, which is bag.

I think Fluent NHibernate is wrong with hiding the decision which collection type to use somewhere in conventions. This changes a lot in how NHibernate behaves and the choice should be explicit at mapping level, so that we'll need to think about it instead of letting Fluent to choose something what could be highly inappropriate.

OK, so now we know that our Users collection in Group is implicitly defined as bag. Why it recreates the whole collection just to remove one row? This is just how bag works. Bag doesn't have a primary key and NHibernate can't construct SQL query that addresses the single row to delete - WHERE User_id = 2 AND Group_id = 1 is ambiguous when the row is duplicated (and it can be, as we saw in the previous post). NHibernate's strategy to solve that problem, which we saw above, is obviously the simplest one, but at least it makes us think and look for another solution.

In case of many-to-many, in almost all cases, we in fact need set semantics. Duplicate rows in intermediate table have no meaning and should be forbidden at database level using primary key constraint.

So let's map our Users collection in Group classmap as set:

// in UserMap
HasManyToMany(x => x.Groups).Inverse();

// in GroupMap
HasManyToMany(x => x.Users).AsSet();

Note that we can leave the collection at User side with default bag mapping because it is marked as Inverse, so there is no database write triggered from there and read efficiency is the same with bag and set (it's just a SELECT to fetch all data by foreign key).

Let's look at the queries from our session this time:

Much better. There is no DELETE statement for whole collection and there is no re-inserting. There's just one DELETE that will affect one row only (this is guaranteed by set semantics).

Why do we need to load the collection anyway (statement #2)? I'll explain it in the next post, together with summary and conclusions for many-to-many mappings.

No comments:

Post a Comment