Friday, December 30, 2011

Collections of components - an alternative

In a project I work in we have a lot of places in our data model where an entity has some properties collections used just for displaying it to the customer, like list of company locations or source URLs for the document. We are not filtering/querying our data by these properties at all and we're pretty sure we wouldn't do that in the nearest projects lifetime.

It may seem right to map these collections in NHibernate as collections of components. Component in NHibernate is an object without its own identity, always being a part of its parent entity - so called value type. The classical example is User entity and Address component. There can be no Address that doesn't belong to any User, and deleting the User will delete the Address, too. Component mapping is designed for representing single database row as multiple objects (User and its corresponding Address are mapped from single row in Users table).

Our URLs or company locations sounds similiar in terms of lifetime and ownership, but totally different in terms of relationship - these are one-to-many relationships instead of simple one-to-one in User-Address example. Because of that, different approach is needed to store the data. The road we took at the beginning was the most obvious one for someone who learned about database normalization rules - we've created separate tables for each of the collections - i.e. CompanyLocations table with CompanyId foreign key and string-typed column for the location. When mapping these tables in NHibernate, we still didn't want to give rows from these tables its own identity - logically they were still owned by our main entities, i.e. Company. So we mapped it as collections of components:

mapping.HasMany(x => x.Locations).AsBag()
.Component(m => m.Map(x => x.Location));

This implies two things:

  • when we need to use the values from the collections, we need to issue another database query (or make some unnatural joins)
  • tables created for storing collection of components don't have proper primary keys defined - as rows stored there are identified by parent identity only

In complex scenarios, when there are many of collections like that and/or objects are used in long-running transactions, this becomes serious issue. We've decided to try another solution - it's so simple that it can't go wrong.

We've just replaced separate tables for collections with single column in parent's entity table that keeps whole collection in serialized form - like JSON or just comma-separated values. As I mentioned at the beginning, the values are never used for querying and are needed just to be displayed, so there's nothing wrong with breaking the database normalization here. From the database perspective, this column contains just some kind of plain data.

For convenience of using the serialized collection at object level, we've created custom user type definition responsible for deserializing and serializing back the values, so that our client code was not affected by these changes at all. Here is the code for our string collection wrapper and IUserType implementation for comma-separated values:

[Serializable]
public class DelimitedString : HashSet<string>
{
private const string Delimiter = ";";

public DelimitedString()
{
}

public DelimitedString(IEnumerable<string> collection)
: base(collection ?? new string[0])
{
}

private DelimitedString(string delimited)
: base(Split(delimited))
{
}

private static IEnumerable<string> Split(string delimited)
{
if (delimited == null)
return new string[0];

return delimited
.Split(new[] { Delimiter }, StringSplitOptions.RemoveEmptyEntries)
.Select(x => x.Trim());
}

public bool Equals(DelimitedString other)
{
return !ReferenceEquals(null, other)
&& this.SequenceEqual(other);
}

public override string ToString()
{
return String.Join(Delimiter, this.ToArray());
}

public override bool Equals(object obj)
{
if (ReferenceEquals(null, obj))
return false;
if (ReferenceEquals(this, obj))
return true;
if (!(obj is DelimitedString))
return false;
return Equals((DelimitedString)obj);
}

public class DelimitedStringUserType : IUserType
{
public new bool Equals(object x, object y)
{
return Object.Equals(x, y);
}

public int GetHashCode(object x)
{
return x.GetHashCode();
}

public object NullSafeGet(IDataReader rs, string[] names, object owner)
{
var delimited = rs[names[0]];
if (delimited == DBNull.Value)
return new DelimitedString();

return new DelimitedString((string)delimited);
}

public void NullSafeSet(IDbCommand cmd, object value, int index)
{
((IDataParameter)cmd.Parameters[index]).Value = ((value != null) ? value.ToString() : (object)DBNull.Value);
}

public object DeepCopy(object value)
{
return new DelimitedString((DelimitedString)value);
}

public object Replace(object original, object target, object owner)
{
return original;
}

public object Assemble(object cached, object owner)
{
return cached;
}

public object Disassemble(object value)
{
return value;
}

public SqlType[] SqlTypes
{
get { return new[] { NHibernateUtil.String.SqlType }; }
}

public Type ReturnedType
{
get { return typeof(DelimitedString); }
}

public bool IsMutable
{
get { return false; }
}
}
}

Note that this wrapper class is a HashSet<T>, what ensures uniqueness of string values contained. If this is not needed, HashSet can be replaced i.e. with simple List<T>.

Here is our entity and the Fluent mapping that uses our string collection through an user type:

// entity class
public class Company
{
// ...
public virtual DelimitedString Locations { get; set; }
}

// mapping
mapping.Map(x => x.Locations).CustomType<DelimitedString>();

Saturday, December 10, 2011

Lookup table for constants with NHibernate

Databases are pretty commonly used to keep lists of constant values, so called dictionaries or lookup tables. Applications are just loading all the values at startup and keeping them in memory for the whole run for performance reasons.

This doesn't sound like a real use case for such a powerful tool like the database, designed to handle relations, transactions, aggregations and a lot of other complicated stuff. But it's often most convenient to create just one more table for our dictionary - we know how to manage the database and it's already there. Creating another integration point for our application just to keep a list of countries probably makes no sense.

How should we handle dictionaries in NHibernate? Well, in fact, we shouldn't rely on NHibernate too much in this case. NHibernate is designed to manipulate entities and constant values are not entities. These are just some plain objects, that accidentally came from the database.

Let's say we want to have list of countries in our database to show in dropdown within user address form. When fetching Country objects as entities, NHibernate does a lot of unnecessary stuff. Each Country object is tracked for changes that will never occur. Entities are also attached to the ISession that fetched it. We don't want to keep this session forever for sure, but this means that our entities will span across multiple sessions. That's bad, too - entities are supposed to live in scope of single Unit of Work.

For fetching the constant data from the database, we should choose any method that doesn't produce objects as persistent entities. We can either use plain ADO.NET connection and construct the objects ourselves, or use CreateSqlQuery with AliasToBeanTransformer, as I showed in the previous post. In both methods we fall back to plain SQL, but it is very straightforward and portable, and the timings are very good.

Suppose we're using Country objects as entities properties in some parts of our application. In this case we shouldn't use the same objects for querying and data manipulation than for filling in the dropdown. Objects in dictionaries are constants, for lookup purposes only. Country object used as a part of entity are managed by NHibernate and it should not be the same instance as Country constant in dictionary, even if they are both instances of Country class that represents the same country. In fact, it'll be better to have two separate classes for countries, so that objects are explicit about their roles and they don't abuse Single Responsibility Principle.

In some cases - i.e. when we're listing users for given country - it may be convenient and logically correct to use the constant values for querying. In scenarios like that, our entities need to have a relation to constant value. This can be done using custom IUserType that translates id (or any other key our Country has) at database level to the Country object from our lookup.

public class CountryType : IUserType
{
public SqlType[] SqlTypes
{
get
{
return new[] { NHibernateUtil.Int32.SqlType };
}
}

public Type ReturnedType
{
get
{
return typeof(Country);
}
}

public new bool Equals(object x, object y)
{
return Object.Equals(x, y);
}

public int GetHashCode(object x)
{
return x.GetHashCode();
}

public object NullSafeGet(IDataReader rs, string[] names, object owner)
{
var id = NHibernateUtil.Int32.NullSafeGet(rs, names[0]);

if (id == null)
return null;

// get constant object from our lookup table
return CountryLookup.ById((int) id);
}

public void NullSafeSet(IDbCommand cmd, object value, int index)
{
if (value == null)
NHibernateUtil.Int32.NullSafeSet(cmd, null, index);
else
{
// persist only key of our constant object
NHibernateUtil.Int32.NullSafeSet(cmd, ((Country)value).Id, index);
}
}

public object DeepCopy(object value)
{
if (value == null)
return null;
var country = (Country)value;
return new Country() { Id = country.Id, Name = country.Name };
}

public bool IsMutable
{
get { return false; }
}

public object Replace(object original, object target, object owner)
{
return original;
}

public object Assemble(object cached, object owner)
{
return cached;
}

public object Disassemble(object value)
{
return value;
}
}

This way, the fact that Country is not really an entity is transparent to our code, we can query by Country, have Countries in relations or whatever we need, without breaking the rule that constants are not entities.

Friday, December 9, 2011

Fetching whole table with NHibernate

I've done a quick benchmark for different ways of retrieving whole tables from the database. I've generated 10,000 random rows in my test table with countries first and then run simple SELECT * FROM Country queries for 7 different ways using more or less of NHibernate.

Here are the methods and its results:

2263 ms for LINQ query:

session.Query<Country>().ToList();

1468 ms for ICriteria:

session.CreateCriteria<Country>().List<Country>();

1289 ms for QueryOver:

session.QueryOver<Country>().List();

1431 ms for HQL:

session.CreateQuery("From Country").List<Country>();

1428 ms for SQL query with AddEntity:

session.CreateSQLQuery("select Id, Name, Continent from Country")
.AddEntity(typeof(Country))
.List<Country>();

593 ms for SQL query with AliasToBeanTransformer:

session.CreateSQLQuery("select Id, Name, Continent from Country")
.AddScalar("Id", NHibernateUtil.Int32)
.AddScalar("Name", NHibernateUtil.String)
.AddScalar("Continent", NHibernateUtil.String)
.SetResultTransformer(Transformers.AliasToBean<Country>())
.List<Country>();

489 ms for SQL query with manual objects creation:

session.CreateSQLQuery("select Id, Name, Continent from Country")
.AddScalar("Id", NHibernateUtil.Int32)
.AddScalar("Name", NHibernateUtil.String)
.AddScalar("Continent", NHibernateUtil.String)
.List<object[]>()
.Select(row => new Country()
{
Id = (int) row[0],
Name = (string) row[1],
Continent = (string) row[2]
}).ToList();

The results are pretty interesting.

Contrary to LINQ provider, there's no significant difference between ICriteria, QueryOver and HQL. Fetching data with native SQL query gives no performance gain when used with AddEntity - NHibernate still needs to initialize its infrastructure for managing entities - result is comparable to previous methods.

But there is a big difference when using plain SQL queries and transforming rows to objects on our own - it is about 3 times faster than letting NHibernate to do it. Why is that? When using AliasToBeanTransformer or when creating objects manually, we in fact don't work with persistent entities. Country objects created in two last queries are just POCOs, not associated with NHibernate session at all - its changes are not tracked and persisted automatically. This is not the usual way we are supposed to work with NHibernate, but this can be useful in some scenarios.

Thursday, December 1, 2011

Many-to-many mapping: guidance

In the previous posts I've described how to teach NHibernate about our bidirectional many-to-many relationships using Inverse attribute and I've gone through collection types used in many-to-many mapping to see how they differ in terms of performance. Time to sum up the topic of many-to-many relationship mappings with a bit of guidance.

1. Map one side of many-to-many relationship with Inverse attribute

NHibernate should trigger database writes from one side only, otherwise you'll end up with duplicated values in intermediate table or primary key violation errors. From database perspective, Inverse side becomes read-only, it is only modified at objects level.

2. Add relationships at both ends

For bidirectional relationships, create a single method responsible for adding the relationship between two entities. The method should modify the collections at both sides of the relationship at once, to ensure the objects state is always correct. When you've added Group to User, always add User to Group, too. NHibernate doesn't reload the entities within single session, so it is not able to figure out the changes at the second side of the relationship automatically.

3. Avoid mapping Inverse side of many-to-many

In some cases you don't really need to map both sides of the relationship. If your application is not going to query the database for data from the Inverse side, you'd better not map this side at all. Less mappings, less bugs. And there is no risk of forgetting to update collections at both sides as there is one side only.

4. Map Inverse side using bag

Inverse side of the relationship from the database perspective is read-only - it doesn't trigger any writes. Read-only collections can be mapped as bag as we don't need to care about write penalties. And accessing the collection will always load all its values, regardless of the collection type, so it's best to use the simplest one here.

5. Map active side using set

Set ensures uniqueness. It's good to have uniqueness in many-to-many relationships as there are almost no use cases for non-unique many-to-many relationship. Moreover, set is much better for updates - it can add/update/remove single rows, contrary to bag, which is always deleting and re-creating all relations.

6. Map smaller side as active

ISet Add/Remove methods return boolean indicating whether modifying the set succeeded (it can fail due to set's uniqueness constraint). To determine the proper return value, NHibernate needs to load the collection when modifying it. To ensure the best performance, we should think which side of the relationship is expected to have less values and then map this side as active set and the second one as Inverse bag. It's always good to load as few values as possible and there's no difference which side is Inverse from object-oriented perspective, as on object level both sides behaves identically.


Below is the correct mapping for our Users/Groups example. I've mapped user's groups collection as active set, as I expect one user to have only a few groups. I've mapped group's user list as bag and marked it as Inverse, as single group can possibly have thousands of members.

// in UserMap
HasManyToMany(x => x.Groups).AsSet();

// in GroupMap
HasManyToMany(x => x.Users).AsBag().Inverse();

And HBM version:

<!-- in User.hbm.xml -->
<set name="Groups" table="UsersInGroups">
<key column="UserId" />
<many-to-many column="GroupId" class="Group" />
</set>

<!-- in Group.hbm.xml -->
<bag name="Users" table="UsersInGroups" inverse="true">
<key column="GroupId" />
<many-to-many column="UserId" class="User" />
</bag>

Below is the metod to add the relationship. This is single method that touches both sides of the relationship at once, but only User side triggers the database call, as the Group side is read-only at database level. I've decided to put the method within Group class, as it fits there logically.

// in Group
public virtual void AddMember(User user)
{
this.Users.Add(user);
user.Groups.Add(this);
}