NOtherDev: many-to-many

Showing posts with label many-to-many. Show all posts

Sunday, April 15, 2012

NHibernate's inverse - what does it really mean?

NHibernate's concept of 'inverse' in relationships is probably the most often discussed and misunderstood mapping feature. When I was learning NHibernate, it took me some time to move from "I know where should I put 'inverse' and what then happens" to "I know why do I need 'inverse' here and there at all". Also now, whenever I'm trying to explain inverses to somebody, I find it pretty hard.

There are a lot of explainations over the net, but I'd like to have my own one. I don't think that the others are wrong, it'll just help me arrange my own understanding and if anyone else take advantage of this, that's great.

Where do we use inverse?

First, some widely-known facts, next we'll elaborate on few of them.

Inverse is a boolean attribute that can be put on the collection mappings, regardless of collection's role (i.e. within one-to-many, many-to-many etc.), and on join mapping.
We can't put inverse on other relation types, like many-to-one or one-to-one.
By default, inverse is set to false.
Inverse makes little sense for unidirectional relationships, it is to be used only for bidirectional ones.
General recommendation is to use inverse="true" on exactly one side of each bidirectional relationship.
When we don't set inverse, NHProf will complain about superfluous updates.

What does it mean for a collection to be 'inverse'?

The main problem in understanding 'inverse' is it's negating nature. We're not used to setting something up in order to NOT take an action. Inverse set to true means "I do NOT maintain this relationship". Hence, inverse set to false means "I DO maintain this relationship".

It'll be much more understandable if we could go to the opposite side of the relationship and be positive there: "This side maintains the relationship" and NHibernate would automatically know that the other side doesn't (*). But it is implemented as it is - we have to live with inverse's negative character.

Each relationship is represented in the database as an identifier of a related table row in the foreign key column at 'many' side. Why at 'many' side? Because that's how we do relationships in the relational databases. The column "holding" the association is always at 'many' side. It's not possible to keep the association at 'one' side because we'd have to insert many values into one database field somehow.

So what does it mean for a collection in NHibernate to maintain the relationship (inverse="false")? It means to ensure that the relation is correctly represented in the database. If the Comments collection in the Post object is responsible for maintaining the relationship, it has to make sure all its elements (comments) have foreign keys set to post's id. In order to do that, it issues a SQL UPDATE statement for each Comment, updating its Post reference. It works, the relationship is persisted correctly, but these updates often do not change anything and can be skipped (for performance reasons).

Inverse="true" on a collection means that it should not take care whether the foreign keys in the database are properly set. It just assumes that some other party will take care of it. What do we gain? We have no superfluous UPDATE statements. What can we lose? We have to be sure that the second side actually takes over the responsibility of maintaining the association. If it doesn't, nobody will and we'll be surprised that our relationship is not persisted at all (NHibernate will not throw an error or so, it won't guess that it's not what we've expected).

When should we set `inverse="true"`?

Let's consider one-to-many first. Our relationship must be bidirectional and have entities (not value types) at both sides for inverse to make sense. Other side ('many' side) is always active, we can't set inverse on many-to-one. This means that we should put inverse="true" on the collection, provided that:

our collection is not explicitly ordered (like <list>) - it is i.e. <bag> or <set>; ordered lists have to be active in order to maintain the ordering correctly; 'many' side doesn't know anything about the ordering of collection at 'one' side
we actually set the relationship at 'many' side correctly

Consider the example:

public class Post
{
    public virtual int Id { get; set; }
    public virtual ICollection<Comment> Comments { get; set; }
}

public class Comment
{
    public virtual int Id { get; set; }
    public virtual Post Post { get; set; }
    public virtual string Text { get; set; }
}

// ...

var comment = new Comment() { Text = "the comment" };
session.Persist(comment);
post.Comments.Add(comment);

We are not setting Post property in Comment class as we may expect NHibernate will handle that as we append our comment to the collection of comments in particular Post object (**). If the post.Comments collection is not inverse, it will actually happen, but quite ineffectively:

We've inserted null reference first (exactly as it was in our code) and then, as the collection is responsible for maintaining the relationship (inverse="false"), the relationship was corrected by separate UPDATE statement. Moreover, in case we have not null constraint on Comment.Post_id (which is actually good), we'll end up with exception that we can't insert null foreign key value.

Let's see what happens with inverse="true":

There's no error, but the comment is actually not connected to the post, despite we've added it to a proper collection. But using inverse, we've explicitly turned off maintaining the relationship by that collection. And as we don't set the relationship on Comment side, noone does.

The solution of course is to explicitly set comment's Post property. It is good from object model perspective, too, as it reduces the amount of magic in our code - what we've set is set, what we haven't set is not set magically.

var comment = new Comment() { Text = "the comment", Post = post };
session.Persist(comment);
post.Comments.Add(comment);

Much better now:

Time for many-to-many. Again, inverse makes sense only when we've mapped both sides. We have to choose one side which is active and mark the second one as inverse="true". Without that, when both collections are active, both try to insert a tuple to an intermediate table many-to-many needs. Having duplicated tuples makes no sense in most cases. For some suggestions how to choose which side is better in being active, see my post from December.

To sum up

Left side	Right side	Inverse?
one-to-many	not mapped	makes no sense - left side must be active
one-to-many	many-to-one	right side should be active (left with `inverse="true"`), to save on UPDATEs (unless left side is explicitly ordered)
many-to-many	not mapped	makes no sense - left side must be active
many-to-many	many-to-many	one side should be active (`inverse="false"`), the other should not (`inverse="true"`)

______

(*) There are of course reasons why NHibernate doesn't do assumptions about other sides of relationships like that. The first one is to maintain independence between mappings - it will be cumbersome if change in mapping A modifies the B behaviour. The second one are ordered collections, like List. The ordering can be automatically kept by NHibernate only when collection side is active (inverse="false"). If the notion of being active is managed on the other side only, changing the collection type from non-ordered to ordered would require changes in both mappings.

(**) Note that inverse is completely independent from cascading. We can have cascade save on collection and it does not affect which side is responsible for managing the relationship. Cascade save means only that when persisting Post object, we're also persisting all Comments that were added to the collection. They are inserted with null Post value and UPDATEd later or inserted with proper value in single INSERT, depending on object state and inverse setting, as described above.

Tuesday, January 24, 2012

Mapping-by-Code - OneToMany and other collection-based relation types

This post is going to be a continuation for the previous one, about Set and Bag mappings. Previously I've described collection and key column mappings. This time I'll cover mapping part that defines the relation type the collection takes part in.

There are five relation types supported by collections. I'll list it with its HBM names:

one-to-many - when the collection elements are entities
many-to-many - same, but storing the relation in separate table to allow m:n relations
many-to-any - heterogenous association with entities of different types
element - when the collection elements are single-column value types
composite-element - when the collection elements are multiple-column value types (components)

In mapping-by-code, the relation type is defined in the third parameter of Set/Bag mapping. It is optional, with default one-to-many. There's a method for every relation type. Let's go through that methods one by one - I'll show only the lambda from third Set/Bag method parameter.

The first one is OneToMany, for one-to-many entity mapping, obviously.

r => r.OneToMany(m =>
{
    m.NotFound(NotFoundMode.Exception); // or NotFoundMode.Ignore
    m.Class(typeof(CustomType));
    m.EntityName("entityName");
})

It has an optional parameter with configuration. NotFound defines the NHibernate behavior when the referenced entity is missing in the database. Class and EntityName allows to set up the relation for non-standard other side mappings.

Next is ManyToMany. The main difference is how the relation is stored in the database. ManyToMany relation needs an intermediate table with foreign keys to allow m:n relations. There are several options available affecting how the additional table looks like.

r => r.ManyToMany(m =>
{
    m.Column("otherKeyColumnName");
    // or
    m.Column(c =>
    {
        c.Name("otherKeyColumnName");
        // etc...
    });

    m.ForeignKey("otherKey_fk");
    m.Formula("arbitrary SQL expression");
    m.Lazy(LazyRelation.Proxy); // or LazyRelation.NoProxy or LazyRelation.None
    m.NotFound(NotFoundMode.Exception); // or NotFoundMode.Ignore

    m.Class(typeof(CustomType));
    m.EntityName("entityName");
})

Configuration parameter of ManyToMany is optional - it may be skipped if we leave all options with default values and set the naming through the convention. In the options there is standard Column method that allows to define name and other DDL-level properties of the key column referencing other side entity (note that we've defined key column for our entity in bag/set mapping options) - it is useful if we don't map the other side and still be able to generate the tables properly. We can also set up laziness through Lazy method, behaviour for not found rows through NotFound or even set up the relation using arbitrary SQL expression instead of foreign key column value using Formula method.

Third one is ManyToAny. This is quite an exotic feature of NHibernate, but there are some cases where it's really useful. See Ayende's post for detailed description. Generally, this is for the case when we have a many-to-many relation with entitles of different types at the other side. NHibernate needs to be said how to distinguish the type of entity and is able to query the proper tables for different objects. Let's stick to Ayende's example:

r.ManyToAny<long>(m =>
{
    m.Columns(id =>
    {
        id.Name("PaymentId");
        id.NotNullable(true);
        // etc...
    }, classRef =>
    {
        classRef.Name("PaymentType");
        classRef.NotNullable(true);
        // etc...
    });
   
    m.IdType<long>(); // redundant, needs to be specified in ManyToAny parameter
    m.MetaType<string>();
   
    m.MetaValue("CreditCard", typeof(CreditCardPayment));
    m.MetaValue("Wire", typeof(WirePayment));
})

The generic type in ManyToAny method defines the common type for identifiers of all entities at the other side. Inside the configuration (which is required in this case), we need to define properties for two columns this time - one to keep the other entity identifier, second to keep its discriminating value. We do it using Columns method's parameters. Later we have to specify the type of discriminator using MetaType method and its generic argument - string is good here. We can also specify the common type of identifiers using IdType method, but we've already did it in ManyToAny generic parameter (I think that this method is useless here). The last thing we need to do is to define the list of entity types that are allowed at other side of the relation and its corresponding discriminator values. In order to do this, we call MetaValue method - its first parameter is the discriminator value, second is the type.

The next collection-based relation type available is Element. This is designed for collection of simple value-typed objects, i.e. list of strings.

r => r.Element(m =>
{
    m.Column("valueColumnName");
    // or
    m.Column(c =>
    {
        c.Name("valueColumnName");
        // etc...
    });

    m.Formula("arbitrary SQL expression");
    m.Length(100);
    m.NotNullable(true);
    m.Precision(10);
    m.Scale(10);
    m.Type<CustomType>(parameters);
    m.Unique(true);
})

The options available are quite standard - there are DDL options for value column available within Column method and different Property-like options describing the value itself. Note that foreign key column options or table options are defined in collection options.

The last possible relation type for collection mapping is Component, known in XML as composite-element. Mapping-by-Code merged these two terms into component, because there is no real difference besides the fact that components were parts of single objects and composite elements were used in collections only.

r => r.Component(m =>
{
    m.Property(x => x.Name);
    // etc...
})

The mapping itself is like already described component mapping, so I'll skip it here.

Fluent NHibernate's equivalents

As I've already described in the previous post, Fluent NHibernate is not separating the collection mapping from the relation mapping, mixing it together in one method chain. Many-to-any relation is not supported by FNH, and the remaining four types of relations are mapped differently.

Let's go through the mappings - I'll skip the options regarding collection mapping and reflect only these options, that are part of relation mappings in mapping-by-code to keep the comparison consistent.

The first one is HasMany for one-to-many relation:

HasMany(x => x.Users)
    .NotFound.Ignore() // or .Exception()
    .EntityName("entityName");

HasManyToMany is for many-to-many:

HasManyToMany(x => x.Users)
    .ChildKeyColumn("otherKeyColumnName")
    .ForeignKeyConstraintNames("parentForeignKeyName", "childForeignKeyName")
    .NotFound.Ignore() // or .Exception()
    .EntityName("entityName");

Formula mapping is missing. Foreign key name configuration is joined for both sides. There are few more options regarding "child" (other entity) key column, all with names starting with Child.

Many-to-any relation is not supported in Fluent NHibernate.

The next one is element relation, merged into HasMany method, available in the chain through Element method:

HasMany(x => x.Users)
    .Element("valueColumnName", m =>
    {
        m.Formula("arbitrary SQL expression")
            .Length(100)
            .Type<CustomType>();
    })

Other element relation options are not supported.

And finally, there is composite element (component) mapping, named Component here, too. It is also merged into HasMany chain.

HasMany(x => x.Users)
    .Component(m =>
    {
        m.References(x => x.Name);
        // etc...
    })

Thursday, December 1, 2011

Many-to-many mapping: guidance

In the previous posts I've described how to teach NHibernate about our bidirectional many-to-many relationships using Inverse attribute and I've gone through collection types used in many-to-many mapping to see how they differ in terms of performance. Time to sum up the topic of many-to-many relationship mappings with a bit of guidance.

1. Map one side of many-to-many relationship with Inverse attribute

NHibernate should trigger database writes from one side only, otherwise you'll end up with duplicated values in intermediate table or primary key violation errors. From database perspective, Inverse side becomes read-only, it is only modified at objects level.

2. Add relationships at both ends

For bidirectional relationships, create a single method responsible for adding the relationship between two entities. The method should modify the collections at both sides of the relationship at once, to ensure the objects state is always correct. When you've added Group to User, always add User to Group, too. NHibernate doesn't reload the entities within single session, so it is not able to figure out the changes at the second side of the relationship automatically.

3. Avoid mapping Inverse side of many-to-many

In some cases you don't really need to map both sides of the relationship. If your application is not going to query the database for data from the Inverse side, you'd better not map this side at all. Less mappings, less bugs. And there is no risk of forgetting to update collections at both sides as there is one side only.

4. Map Inverse side using bag

Inverse side of the relationship from the database perspective is read-only - it doesn't trigger any writes. Read-only collections can be mapped as bag as we don't need to care about write penalties. And accessing the collection will always load all its values, regardless of the collection type, so it's best to use the simplest one here.

5. Map active side using set

Set ensures uniqueness. It's good to have uniqueness in many-to-many relationships as there are almost no use cases for non-unique many-to-many relationship. Moreover, set is much better for updates - it can add/update/remove single rows, contrary to bag, which is always deleting and re-creating all relations.

6. Map smaller side as active

ISet Add/Remove methods return boolean indicating whether modifying the set succeeded (it can fail due to set's uniqueness constraint). To determine the proper return value, NHibernate needs to load the collection when modifying it. To ensure the best performance, we should think which side of the relationship is expected to have less values and then map this side as active set and the second one as Inverse bag. It's always good to load as few values as possible and there's no difference which side is Inverse from object-oriented perspective, as on object level both sides behaves identically.

Below is the correct mapping for our Users/Groups example. I've mapped user's groups collection as active set, as I expect one user to have only a few groups. I've mapped group's user list as bag and marked it as Inverse, as single group can possibly have thousands of members.

// in UserMap
HasManyToMany(x => x.Groups).AsSet();

// in GroupMap
HasManyToMany(x => x.Users).AsBag().Inverse();

And HBM version:

<!-- in User.hbm.xml -->
<set name="Groups" table="UsersInGroups">
    <key column="UserId" />
    <many-to-many column="GroupId" class="Group" />
</set>

<!-- in Group.hbm.xml -->
<bag name="Users" table="UsersInGroups" inverse="true">
    <key column="GroupId" />
    <many-to-many column="UserId" class="User" />
</bag>

Below is the metod to add the relationship. This is single method that touches both sides of the relationship at once, but only User side triggers the database call, as the Group side is read-only at database level. I've decided to put the method within Group class, as it fits there logically.

// in Group
public virtual void AddMember(User user)
{
    this.Users.Add(user);
    user.Groups.Add(this);
}

Sunday, November 27, 2011

Many-to-many mapping: collection types

In the previous post we've seen how to instruct NHibernate to generate proper number of INSERT statements to intermediate many-to-many table using Inverse attribute. Time for something bit more complicated.

In the previous example we've created fresh instances of Users and Group and used it in the same session, so NHibernate knew the collections were initially empty. Let's see what happens when we want to work with the collections that are fetched from the database with lazy loading, so that NHibernate doesn't know the collections.

In the example below we're just trying to remove second User from the Group - a task that could be accomplished with single DELETE statement.

using (var sess = factory.OpenSession())
using (var tx = sess.BeginTransaction())
{
    var group = sess.Get<Group>(1);
    var user2 = sess.Load<User>(2);

    group.Users.Remove(user2);
    tx.Commit();
}

And here are the queries run in this session:

Well, we just wanted to delete single row from GroupsToUsers, but NHibernate decided to load the collection of users assigned to our Group with ID=1, purge the whole collection, remove the entry in memory and re-add rows that left one by one (which may be thousands of rows!). Seems a bit redundant, doesn't it?

To understand what happened here, first we need to know the difference between the collection types supported by NHibernate and which are used by default by Fluent NHibernate if we didn't specify it explicitly.

There are two collection types that are important for many-to-many mappings - bags and sets:

bag is the simplest container - it just holds any items without uniqueness checking
set is unique - it means that it can't have two items with exactly the same values

In my object model, I've defined the collections properties using ICollection<T> interface. It seems to be the good choice as it's the base interface for all collections. But ICollection doesn't give any hint to Fluent NHibernate which type of collection to use and Fluent NHibernate chooses the simplest one, which is bag.

I think Fluent NHibernate is wrong with hiding the decision which collection type to use somewhere in conventions. This changes a lot in how NHibernate behaves and the choice should be explicit at mapping level, so that we'll need to think about it instead of letting Fluent to choose something what could be highly inappropriate.

OK, so now we know that our Users collection in Group is implicitly defined as bag. Why it recreates the whole collection just to remove one row? This is just how bag works. Bag doesn't have a primary key and NHibernate can't construct SQL query that addresses the single row to delete - WHERE User_id = 2 AND Group_id = 1 is ambiguous when the row is duplicated (and it can be, as we saw in the previous post). NHibernate's strategy to solve that problem, which we saw above, is obviously the simplest one, but at least it makes us think and look for another solution.

In case of many-to-many, in almost all cases, we in fact need set semantics. Duplicate rows in intermediate table have no meaning and should be forbidden at database level using primary key constraint.

So let's map our Users collection in Group classmap as set:

// in UserMap
HasManyToMany(x => x.Groups).Inverse();

// in GroupMap
HasManyToMany(x => x.Users).AsSet();

Note that we can leave the collection at User side with default bag mapping because it is marked as Inverse, so there is no database write triggered from there and read efficiency is the same with bag and set (it's just a SELECT to fetch all data by foreign key).

Let's look at the queries from our session this time:

Much better. There is no DELETE statement for whole collection and there is no re-inserting. There's just one DELETE that will affect one row only (this is guaranteed by set semantics).

Why do we need to load the collection anyway (statement #2)? I'll explain it in the next post, together with summary and conclusions for many-to-many mappings.

Saturday, November 26, 2011

Many-to-many mapping: Inverse

In this and few next posts I'll try to go over the rules and guidelines and give some insights into how to map many-to-many relationship in NHibernate properly.

In general, it is good for relationships in object model to follow the relationships at database level, so that querying for objects can be naturally translated into SQL queries without any additional overhead.

This is quite easy and natural for parent-child relationship (aka many-to-one). Child object can have a reference to its parent object as well as child row can have a foreign key to its parent row.

Many-to-many relationships (i.e. Users and Groups) are different. In fact, they can exist only at object model level - at database level it is only a concept implemented using two many-to-one relations with intermediate table.

We can implement it the same way in our NHibernate-based object model, but the intermediate entity will probably have no additional properties and this will lead to quite polluted and ugly model. We don't want User and Group to have collections of some UserInGroup objects that have no real object-oriented meaning. We do want User to have many Groups and Group to have many Users - as simple as it can be.

Fortunately, this is quite common scenario and NHibernate can support it very well with respect to database-level constraints and good practices, but only when mapped and used with care.

Let's begin with the simplest possible mapping using FluentNHibernate:

// in UserMap
HasManyToMany(x => x.Groups);

// in GroupMap
HasManyToMany(x => x.Users);

Now let's add two users to a group. To ensure our objects state is correct and collections both at User and Group side are complete, we need to update the collections on both sides:

using (var sess = sessionFactory.OpenSession())
using (var tx = sess.BeginTransaction())
{
    var user1 = new User() { Name = "u1" };
    sess.Save(user1);

    var user2 = new User() { Name = "u2" };
    sess.Save(user2);

    var group1 = new Group() { Name = "g" };
    sess.Save(group1);

    user1.Groups.Add(group1);
    group1.Users.Add(user1);

    user2.Groups.Add(group1);
    group1.Users.Add(user2);

    tx.Commit();
}

But NHibernate is not meant to guess that when adding one relationship on User side in the 13. and 16. line and then adding another one on Group side in the next line we are in fact specifying the same single relationship. What is the result for lines 13-17?

Each statement in the code above triggers separate INSERT statement, duplicating each relationship. At database level it means that each user is a group member twice. Moreover, when GroupsToUsers table has a primary key defined on both columns (and it probably should have), the database will complain about primary key violation and we'll end up with an exception.

What we need to do here is to inform NHibernate that we're defining the same relation on both ends and it should be represented as one row in the database. We need to choose which side is responsible to do the database insert and mark the other with inverse attribute. For example, let's choose User as inverse and make group.Users.Add(user) trigger the database call.

// in UserMap
HasManyToMany(x => x.Groups).Inverse();

// in GroupMap
HasManyToMany(x => x.Users);

(We'll cover how to choose the inverse side properly later.)

What is the result for lines 13-17 now?

So far, so good. But it's far from correctness yet. By now, it works without overhead only for entitles created just before so that NHibernate knows the current collections state. But things looks much worse when the entities are loaded from the database. We'll look closer at the problem and its solution in the next post.