Friday, December 30, 2011

Collections of components - an alternative

In a project I work in we have a lot of places in our data model where an entity has some properties collections used just for displaying it to the customer, like list of company locations or source URLs for the document. We are not filtering/querying our data by these properties at all and we're pretty sure we wouldn't do that in the nearest projects lifetime.

It may seem right to map these collections in NHibernate as collections of components. Component in NHibernate is an object without its own identity, always being a part of its parent entity - so called value type. The classical example is User entity and Address component. There can be no Address that doesn't belong to any User, and deleting the User will delete the Address, too. Component mapping is designed for representing single database row as multiple objects (User and its corresponding Address are mapped from single row in Users table).

Our URLs or company locations sounds similiar in terms of lifetime and ownership, but totally different in terms of relationship - these are one-to-many relationships instead of simple one-to-one in User-Address example. Because of that, different approach is needed to store the data. The road we took at the beginning was the most obvious one for someone who learned about database normalization rules - we've created separate tables for each of the collections - i.e. CompanyLocations table with CompanyId foreign key and string-typed column for the location. When mapping these tables in NHibernate, we still didn't want to give rows from these tables its own identity - logically they were still owned by our main entities, i.e. Company. So we mapped it as collections of components:

mapping.HasMany(x => x.Locations).AsBag()
.Component(m => m.Map(x => x.Location));

This implies two things:

  • when we need to use the values from the collections, we need to issue another database query (or make some unnatural joins)
  • tables created for storing collection of components don't have proper primary keys defined - as rows stored there are identified by parent identity only

In complex scenarios, when there are many of collections like that and/or objects are used in long-running transactions, this becomes serious issue. We've decided to try another solution - it's so simple that it can't go wrong.

We've just replaced separate tables for collections with single column in parent's entity table that keeps whole collection in serialized form - like JSON or just comma-separated values. As I mentioned at the beginning, the values are never used for querying and are needed just to be displayed, so there's nothing wrong with breaking the database normalization here. From the database perspective, this column contains just some kind of plain data.

For convenience of using the serialized collection at object level, we've created custom user type definition responsible for deserializing and serializing back the values, so that our client code was not affected by these changes at all. Here is the code for our string collection wrapper and IUserType implementation for comma-separated values:

[Serializable]
public class DelimitedString : HashSet<string>
{
private const string Delimiter = ";";

public DelimitedString()
{
}

public DelimitedString(IEnumerable<string> collection)
: base(collection ?? new string[0])
{
}

private DelimitedString(string delimited)
: base(Split(delimited))
{
}

private static IEnumerable<string> Split(string delimited)
{
if (delimited == null)
return new string[0];

return delimited
.Split(new[] { Delimiter }, StringSplitOptions.RemoveEmptyEntries)
.Select(x => x.Trim());
}

public bool Equals(DelimitedString other)
{
return !ReferenceEquals(null, other)
&& this.SequenceEqual(other);
}

public override string ToString()
{
return String.Join(Delimiter, this.ToArray());
}

public override bool Equals(object obj)
{
if (ReferenceEquals(null, obj))
return false;
if (ReferenceEquals(this, obj))
return true;
if (!(obj is DelimitedString))
return false;
return Equals((DelimitedString)obj);
}

public class DelimitedStringUserType : IUserType
{
public new bool Equals(object x, object y)
{
return Object.Equals(x, y);
}

public int GetHashCode(object x)
{
return x.GetHashCode();
}

public object NullSafeGet(IDataReader rs, string[] names, object owner)
{
var delimited = rs[names[0]];
if (delimited == DBNull.Value)
return new DelimitedString();

return new DelimitedString((string)delimited);
}

public void NullSafeSet(IDbCommand cmd, object value, int index)
{
((IDataParameter)cmd.Parameters[index]).Value = ((value != null) ? value.ToString() : (object)DBNull.Value);
}

public object DeepCopy(object value)
{
return new DelimitedString((DelimitedString)value);
}

public object Replace(object original, object target, object owner)
{
return original;
}

public object Assemble(object cached, object owner)
{
return cached;
}

public object Disassemble(object value)
{
return value;
}

public SqlType[] SqlTypes
{
get { return new[] { NHibernateUtil.String.SqlType }; }
}

public Type ReturnedType
{
get { return typeof(DelimitedString); }
}

public bool IsMutable
{
get { return false; }
}
}
}

Note that this wrapper class is a HashSet<T>, what ensures uniqueness of string values contained. If this is not needed, HashSet can be replaced i.e. with simple List<T>.

Here is our entity and the Fluent mapping that uses our string collection through an user type:

// entity class
public class Company
{
// ...
public virtual DelimitedString Locations { get; set; }
}

// mapping
mapping.Map(x => x.Locations).CustomType<DelimitedString>();

Saturday, December 10, 2011

Lookup table for constants with NHibernate

Databases are pretty commonly used to keep lists of constant values, so called dictionaries or lookup tables. Applications are just loading all the values at startup and keeping them in memory for the whole run for performance reasons.

This doesn't sound like a real use case for such a powerful tool like the database, designed to handle relations, transactions, aggregations and a lot of other complicated stuff. But it's often most convenient to create just one more table for our dictionary - we know how to manage the database and it's already there. Creating another integration point for our application just to keep a list of countries probably makes no sense.

How should we handle dictionaries in NHibernate? Well, in fact, we shouldn't rely on NHibernate too much in this case. NHibernate is designed to manipulate entities and constant values are not entities. These are just some plain objects, that accidentally came from the database.

Let's say we want to have list of countries in our database to show in dropdown within user address form. When fetching Country objects as entities, NHibernate does a lot of unnecessary stuff. Each Country object is tracked for changes that will never occur. Entities are also attached to the ISession that fetched it. We don't want to keep this session forever for sure, but this means that our entities will span across multiple sessions. That's bad, too - entities are supposed to live in scope of single Unit of Work.

For fetching the constant data from the database, we should choose any method that doesn't produce objects as persistent entities. We can either use plain ADO.NET connection and construct the objects ourselves, or use CreateSqlQuery with AliasToBeanTransformer, as I showed in the previous post. In both methods we fall back to plain SQL, but it is very straightforward and portable, and the timings are very good.

Suppose we're using Country objects as entities properties in some parts of our application. In this case we shouldn't use the same objects for querying and data manipulation than for filling in the dropdown. Objects in dictionaries are constants, for lookup purposes only. Country object used as a part of entity are managed by NHibernate and it should not be the same instance as Country constant in dictionary, even if they are both instances of Country class that represents the same country. In fact, it'll be better to have two separate classes for countries, so that objects are explicit about their roles and they don't abuse Single Responsibility Principle.

In some cases - i.e. when we're listing users for given country - it may be convenient and logically correct to use the constant values for querying. In scenarios like that, our entities need to have a relation to constant value. This can be done using custom IUserType that translates id (or any other key our Country has) at database level to the Country object from our lookup.

public class CountryType : IUserType
{
public SqlType[] SqlTypes
{
get
{
return new[] { NHibernateUtil.Int32.SqlType };
}
}

public Type ReturnedType
{
get
{
return typeof(Country);
}
}

public new bool Equals(object x, object y)
{
return Object.Equals(x, y);
}

public int GetHashCode(object x)
{
return x.GetHashCode();
}

public object NullSafeGet(IDataReader rs, string[] names, object owner)
{
var id = NHibernateUtil.Int32.NullSafeGet(rs, names[0]);

if (id == null)
return null;

// get constant object from our lookup table
return CountryLookup.ById((int) id);
}

public void NullSafeSet(IDbCommand cmd, object value, int index)
{
if (value == null)
NHibernateUtil.Int32.NullSafeSet(cmd, null, index);
else
{
// persist only key of our constant object
NHibernateUtil.Int32.NullSafeSet(cmd, ((Country)value).Id, index);
}
}

public object DeepCopy(object value)
{
if (value == null)
return null;
var country = (Country)value;
return new Country() { Id = country.Id, Name = country.Name };
}

public bool IsMutable
{
get { return false; }
}

public object Replace(object original, object target, object owner)
{
return original;
}

public object Assemble(object cached, object owner)
{
return cached;
}

public object Disassemble(object value)
{
return value;
}
}

This way, the fact that Country is not really an entity is transparent to our code, we can query by Country, have Countries in relations or whatever we need, without breaking the rule that constants are not entities.

Friday, December 9, 2011

Fetching whole table with NHibernate

I've done a quick benchmark for different ways of retrieving whole tables from the database. I've generated 10,000 random rows in my test table with countries first and then run simple SELECT * FROM Country queries for 7 different ways using more or less of NHibernate.

Here are the methods and its results:

2263 ms for LINQ query:

session.Query<Country>().ToList();

1468 ms for ICriteria:

session.CreateCriteria<Country>().List<Country>();

1289 ms for QueryOver:

session.QueryOver<Country>().List();

1431 ms for HQL:

session.CreateQuery("From Country").List<Country>();

1428 ms for SQL query with AddEntity:

session.CreateSQLQuery("select Id, Name, Continent from Country")
.AddEntity(typeof(Country))
.List<Country>();

593 ms for SQL query with AliasToBeanTransformer:

session.CreateSQLQuery("select Id, Name, Continent from Country")
.AddScalar("Id", NHibernateUtil.Int32)
.AddScalar("Name", NHibernateUtil.String)
.AddScalar("Continent", NHibernateUtil.String)
.SetResultTransformer(Transformers.AliasToBean<Country>())
.List<Country>();

489 ms for SQL query with manual objects creation:

session.CreateSQLQuery("select Id, Name, Continent from Country")
.AddScalar("Id", NHibernateUtil.Int32)
.AddScalar("Name", NHibernateUtil.String)
.AddScalar("Continent", NHibernateUtil.String)
.List<object[]>()
.Select(row => new Country()
{
Id = (int) row[0],
Name = (string) row[1],
Continent = (string) row[2]
}).ToList();

The results are pretty interesting.

Contrary to LINQ provider, there's no significant difference between ICriteria, QueryOver and HQL. Fetching data with native SQL query gives no performance gain when used with AddEntity - NHibernate still needs to initialize its infrastructure for managing entities - result is comparable to previous methods.

But there is a big difference when using plain SQL queries and transforming rows to objects on our own - it is about 3 times faster than letting NHibernate to do it. Why is that? When using AliasToBeanTransformer or when creating objects manually, we in fact don't work with persistent entities. Country objects created in two last queries are just POCOs, not associated with NHibernate session at all - its changes are not tracked and persisted automatically. This is not the usual way we are supposed to work with NHibernate, but this can be useful in some scenarios.

Thursday, December 1, 2011

Many-to-many mapping: guidance

In the previous posts I've described how to teach NHibernate about our bidirectional many-to-many relationships using Inverse attribute and I've gone through collection types used in many-to-many mapping to see how they differ in terms of performance. Time to sum up the topic of many-to-many relationship mappings with a bit of guidance.

1. Map one side of many-to-many relationship with Inverse attribute

NHibernate should trigger database writes from one side only, otherwise you'll end up with duplicated values in intermediate table or primary key violation errors. From database perspective, Inverse side becomes read-only, it is only modified at objects level.

2. Add relationships at both ends

For bidirectional relationships, create a single method responsible for adding the relationship between two entities. The method should modify the collections at both sides of the relationship at once, to ensure the objects state is always correct. When you've added Group to User, always add User to Group, too. NHibernate doesn't reload the entities within single session, so it is not able to figure out the changes at the second side of the relationship automatically.

3. Avoid mapping Inverse side of many-to-many

In some cases you don't really need to map both sides of the relationship. If your application is not going to query the database for data from the Inverse side, you'd better not map this side at all. Less mappings, less bugs. And there is no risk of forgetting to update collections at both sides as there is one side only.

4. Map Inverse side using bag

Inverse side of the relationship from the database perspective is read-only - it doesn't trigger any writes. Read-only collections can be mapped as bag as we don't need to care about write penalties. And accessing the collection will always load all its values, regardless of the collection type, so it's best to use the simplest one here.

5. Map active side using set

Set ensures uniqueness. It's good to have uniqueness in many-to-many relationships as there are almost no use cases for non-unique many-to-many relationship. Moreover, set is much better for updates - it can add/update/remove single rows, contrary to bag, which is always deleting and re-creating all relations.

6. Map smaller side as active

ISet Add/Remove methods return boolean indicating whether modifying the set succeeded (it can fail due to set's uniqueness constraint). To determine the proper return value, NHibernate needs to load the collection when modifying it. To ensure the best performance, we should think which side of the relationship is expected to have less values and then map this side as active set and the second one as Inverse bag. It's always good to load as few values as possible and there's no difference which side is Inverse from object-oriented perspective, as on object level both sides behaves identically.


Below is the correct mapping for our Users/Groups example. I've mapped user's groups collection as active set, as I expect one user to have only a few groups. I've mapped group's user list as bag and marked it as Inverse, as single group can possibly have thousands of members.

// in UserMap
HasManyToMany(x => x.Groups).AsSet();

// in GroupMap
HasManyToMany(x => x.Users).AsBag().Inverse();

And HBM version:

<!-- in User.hbm.xml -->
<set name="Groups" table="UsersInGroups">
<key column="UserId" />
<many-to-many column="GroupId" class="Group" />
</set>

<!-- in Group.hbm.xml -->
<bag name="Users" table="UsersInGroups" inverse="true">
<key column="GroupId" />
<many-to-many column="UserId" class="User" />
</bag>

Below is the metod to add the relationship. This is single method that touches both sides of the relationship at once, but only User side triggers the database call, as the Group side is read-only at database level. I've decided to put the method within Group class, as it fits there logically.

// in Group
public virtual void AddMember(User user)
{
this.Users.Add(user);
user.Groups.Add(this);
}

Sunday, November 27, 2011

Many-to-many mapping: collection types

In the previous post we've seen how to instruct NHibernate to generate proper number of INSERT statements to intermediate many-to-many table using Inverse attribute. Time for something bit more complicated.

In the previous example we've created fresh instances of Users and Group and used it in the same session, so NHibernate knew the collections were initially empty. Let's see what happens when we want to work with the collections that are fetched from the database with lazy loading, so that NHibernate doesn't know the collections.

In the example below we're just trying to remove second User from the Group - a task that could be accomplished with single DELETE statement.

using (var sess = factory.OpenSession())
using (var tx = sess.BeginTransaction())
{
var group = sess.Get<Group>(1);
var user2 = sess.Load<User>(2);

group.Users.Remove(user2);
tx.Commit();
}

And here are the queries run in this session:

Well, we just wanted to delete single row from GroupsToUsers, but NHibernate decided to load the collection of users assigned to our Group with ID=1, purge the whole collection, remove the entry in memory and re-add rows that left one by one (which may be thousands of rows!). Seems a bit redundant, doesn't it?

To understand what happened here, first we need to know the difference between the collection types supported by NHibernate and which are used by default by Fluent NHibernate if we didn't specify it explicitly.

There are two collection types that are important for many-to-many mappings - bags and sets:

  • bag is the simplest container - it just holds any items without uniqueness checking
  • set is unique - it means that it can't have two items with exactly the same values

In my object model, I've defined the collections properties using ICollection<T> interface. It seems to be the good choice as it's the base interface for all collections. But ICollection doesn't give any hint to Fluent NHibernate which type of collection to use and Fluent NHibernate chooses the simplest one, which is bag.

I think Fluent NHibernate is wrong with hiding the decision which collection type to use somewhere in conventions. This changes a lot in how NHibernate behaves and the choice should be explicit at mapping level, so that we'll need to think about it instead of letting Fluent to choose something what could be highly inappropriate.

OK, so now we know that our Users collection in Group is implicitly defined as bag. Why it recreates the whole collection just to remove one row? This is just how bag works. Bag doesn't have a primary key and NHibernate can't construct SQL query that addresses the single row to delete - WHERE User_id = 2 AND Group_id = 1 is ambiguous when the row is duplicated (and it can be, as we saw in the previous post). NHibernate's strategy to solve that problem, which we saw above, is obviously the simplest one, but at least it makes us think and look for another solution.

In case of many-to-many, in almost all cases, we in fact need set semantics. Duplicate rows in intermediate table have no meaning and should be forbidden at database level using primary key constraint.

So let's map our Users collection in Group classmap as set:

// in UserMap
HasManyToMany(x => x.Groups).Inverse();

// in GroupMap
HasManyToMany(x => x.Users).AsSet();

Note that we can leave the collection at User side with default bag mapping because it is marked as Inverse, so there is no database write triggered from there and read efficiency is the same with bag and set (it's just a SELECT to fetch all data by foreign key).

Let's look at the queries from our session this time:

Much better. There is no DELETE statement for whole collection and there is no re-inserting. There's just one DELETE that will affect one row only (this is guaranteed by set semantics).

Why do we need to load the collection anyway (statement #2)? I'll explain it in the next post, together with summary and conclusions for many-to-many mappings.

Saturday, November 26, 2011

Many-to-many mapping: Inverse

In this and few next posts I'll try to go over the rules and guidelines and give some insights into how to map many-to-many relationship in NHibernate properly.

In general, it is good for relationships in object model to follow the relationships at database level, so that querying for objects can be naturally translated into SQL queries without any additional overhead.

This is quite easy and natural for parent-child relationship (aka many-to-one). Child object can have a reference to its parent object as well as child row can have a foreign key to its parent row.

Many-to-many relationships (i.e. Users and Groups) are different. In fact, they can exist only at object model level - at database level it is only a concept implemented using two many-to-one relations with intermediate table.

We can implement it the same way in our NHibernate-based object model, but the intermediate entity will probably have no additional properties and this will lead to quite polluted and ugly model. We don't want User and Group to have collections of some UserInGroup objects that have no real object-oriented meaning. We do want User to have many Groups and Group to have many Users - as simple as it can be.

Fortunately, this is quite common scenario and NHibernate can support it very well with respect to database-level constraints and good practices, but only when mapped and used with care.

Let's begin with the simplest possible mapping using FluentNHibernate:

// in UserMap
HasManyToMany(x => x.Groups);

// in GroupMap
HasManyToMany(x => x.Users);

Now let's add two users to a group. To ensure our objects state is correct and collections both at User and Group side are complete, we need to update the collections on both sides:

using (var sess = sessionFactory.OpenSession())
using (var tx = sess.BeginTransaction())
{
var user1 = new User() { Name = "u1" };
sess.Save(user1);

var user2 = new User() { Name = "u2" };
sess.Save(user2);

var group1 = new Group() { Name = "g" };
sess.Save(group1);

user1.Groups.Add(group1);
group1.Users.Add(user1);

user2.Groups.Add(group1);
group1.Users.Add(user2);

tx.Commit();
}

But NHibernate is not meant to guess that when adding one relationship on User side in the 13. and 16. line and then adding another one on Group side in the next line we are in fact specifying the same single relationship. What is the result for lines 13-17?

Each statement in the code above triggers separate INSERT statement, duplicating each relationship. At database level it means that each user is a group member twice. Moreover, when GroupsToUsers table has a primary key defined on both columns (and it probably should have), the database will complain about primary key violation and we'll end up with an exception.

What we need to do here is to inform NHibernate that we're defining the same relation on both ends and it should be represented as one row in the database. We need to choose which side is responsible to do the database insert and mark the other with inverse attribute. For example, let's choose User as inverse and make group.Users.Add(user) trigger the database call.

// in UserMap
HasManyToMany(x => x.Groups).Inverse();

// in GroupMap
HasManyToMany(x => x.Users);

(We'll cover how to choose the inverse side properly later.)

What is the result for lines 13-17 now?

So far, so good. But it's far from correctness yet. By now, it works without overhead only for entitles created just before so that NHibernate knows the current collections state. But things looks much worse when the entities are loaded from the database. We'll look closer at the problem and its solution in the next post.

Tuesday, November 22, 2011

ReSharper and external ASP.NET MVC views

The project I currently work in is based on ASP.NET MVC and, beside main Web project, it can be extended using additional plugin-style libraries, that we import from specified directories and register in main application. Those plugin libraries can contain additional view files (.aspx or .ascx) - to support that, we're storing it as embedded resources and we have custom VirtualPathProvider that resolves it in main Web application.

The problem we were struggling with for quite a long time was that ReSharper was unable to resolve those views correctly, getting lost on Control definition in the first line of view. This caused whole file to be not understood, a lot of errors and fix suggestions were showing up and with several views like this, the performance was dramatically low.

Recently I've spent some time to investigate it more carefully and it turned out that the reason is fairly simple. This has nothing to do with embedded resource setting or not standard file locations. The only thing to do was to place standard Web.config file for views in each external project that contains ASP.NET MVC views.

Here are the lines that ensures ReSharper knows what to do with .aspx/.ascx files (for ASP.NET MVC 2):

<configuration>
<system.web>
<pages
validateRequest="false"
pageParserFilterType="System.Web.Mvc.ViewTypeParserFilter, System.Web.Mvc, Version=2.0.0.0, Culture=neutral, PublicKeyToken=31BF3856AD364E35"
pageBaseType="System.Web.Mvc.ViewPage, System.Web.Mvc, Version=2.0.0.0, Culture=neutral, PublicKeyToken=31BF3856AD364E35"
userControlBaseType="System.Web.Mvc.ViewUserControl, System.Web.Mvc, Version=2.0.0.0, Culture=neutral, PublicKeyToken=31BF3856AD364E35">
<controls>
<add assembly="System.Web.Mvc, Version=2.0.0.0, Culture=neutral, PublicKeyToken=31BF3856AD364E35"
namespace="System.Web.Mvc" tagPrefix="mvc" />
</controls>
</pages>
</system.web>
</configuration>

Thursday, November 17, 2011

Use tools only for what they are made for

I'm using NHibernate.Linq quite intensively for more than a year now and while I do know it better and better, I use it in less scenarios than before. Why is that? It's easy - there are some scenarios, where using it makes no sense. NHibernate's provider is just a tool, with its design, its features and its limitations. NHibernate.Linq is good in querying for business objects, as this is what it's made for. But we may get tired when trying to do some analytical or reporting queries or when processing data in batch mode. This is of course doable with Linq, but for sure not practical. And this is what plain old SQL is made for.

We, developers, often have some kind of unsaid belief that when we decided to use one specific tool, we shouldn't use another, similiar one. That makes sense when both tools offer exactly the same functionality or when using another tool causes significant performance penalty. But this is not the case with NHibernate.Linq and other database access methods. NHibernate is already bundled with both QueryOver and Linq, so we can use both without any penalties. We can also rely on ICriteria API, HQL or plain SQL queries if needed. We can even use ADO connection directly to totally skip NHibernate's layers. If this makes our task easier, why not?

Good cook always have a lot of knives in his kitchen and he knows that one knive is for fish and other for vegetables. There's no such thing as universal knive to cut everything. The same for us - it's important to be aware what each tool is the best at, to know which one to use in particular situation and try not to look for universal tool "to rule them all".

Monday, November 14, 2011

LINQ Pitfalls: Using external variables in projections

Another issue with NHibernate.Linq provider that surprised me a lot is about variables (or constants) that appear in Select expressions. It seems that you can use it, but once only! See this simple code:

var result1 = session.Query<River>()
.Select(r => new { r.Name, Value = 1 })
.First();
Console.WriteLine("a) {0} = 1?", result1.Value);

var result2 = session.Query<River>()
.Select(r => new { r.Name, Value = 2 })
.First();
Console.WriteLine("b) {0} = 2?", result2.Value);

Note that the queries differ in constant value only. But the difference is pretty obvious. Well, not for NHibernate. This is the actual output:

a) 1 = 1?
b) 1 = 2?

It seems that NHibernate parses expressions only once and caches it internally, ignoring the actual member values. NHibernate will always use cached value 1. This is a silent fail, much worse situation than failing with NotSupportedException or something else.

What is the workaround? Select only data that comes from database and add all other external values in code, outside NHibernate.Linq:

var result = session.Query<River>()
.Select(r => r.Name)
.First()
.Select(r => new { Name = r, Value = 1 });

This is already known bug (see this JIRA entry), marked as critical, but still unresolved. It influences Select only. Values in other expressions, i.e. in Where calls, are not cached and work as you may expect.

Well, I'll maybe try to make a bit more insight into what is supported in NHibernate.Linq and what not...

Sunday, November 6, 2011

LINQ Pitfalls: Order of operators (cont.)

In the previous post I've shown that operators in NHibernate.Linq query have to keep some logical order due to SQL language design. But the rules for NHibernate queries are not as strict as order of clauses in SQL. In fact, there is quite a good compromise between LINQ's flexibility and SQL constraints.

NHibernate generally allows to move operators around as long as the result doesn't depend on the order of operators. For example, there's no logical difference between these two queries:

session.Query<River>()
.OrderBy(x => x.Length)
.Where(x => x.Length > 100)
.ToList();

session.Query<River>()
.Where(x => x.Length > 100)
.OrderBy(x => x.Length)
.ToList();

Both these queries produce correct SQL, even if the first one defines operations in non-SQL-compliant order (WHERE is before ORDER BY in SQL). NHibernate knows that applying Where restriction can be done at any time and it doesn't change the result. That's why it allows to put it almost anywhere in LINQ query. The only rule is to place it before paging operators (Skip and Take), as it does matter if we are paging before or after applying Where conditions.

Similarily with projections - Select operator can be placed in different query parts and it doesn't change the result. Theoretically it could be placed even after paging, but for some reason NHibernate is more SQL-compliant in this case and doesn't allow Take or Skip to precede Select.

So, besides Skip and Take, are there any other operators that need to be placed in specific query part? NHibernate doesn't like OrderBy placed before GroupBy, but this is rather bizzare combination and we may ignore it. All the other supported operators seems to be pretty portable - at least I haven't found any other problems when putting operators in non-SQL-compliant order. Well, NHibernate's Linq provider has to be extremally complex, it is not a surprise that it took 3 man-years to implement it.

How about non-standard, NHibernate-specific operators, such as Cacheable, CacheMode, ToFuture, Fetch, FetchMany? In NHibernate 3.2 it can move around, too. However, in NH 3.0 Fetch and FetchMany had to be the last operators, just before ToList or other method that triggers the database call. Otherwise, NotSupportedException was thrown.

Saturday, November 5, 2011

LINQ Pitfalls: Order of operators

When writing SQL queries you need to put all the clauses in particular order that is enforced by the SQL language itself and somehow relates with how the RDBMS is going to process the query. In general, the order in which the clauses are processed in databases is:

  1. FROM clause
  2. WHERE clause
  3. GROUP BY clause
  4. HAVING clause
  5. SELECT clause
  6. ORDER BY clause
  7. LIMIT/OFFSET clauses

You can't (easily and without subqueries) i.e. apply WHERE conditions after sorting or apply ordering after applying LIMIT (TOP n) clause. This is pretty explicit from SQL language syntax and makes sense for performance reasons.

When writing NHibernate's LINQ queries, the rules generally still apply. We need to always remember that eventually NHibernate have to produce plain old SQL query that is syntactically correct and makes sense. This means that even if it is perfectly legal in LINQ to Objects to write a query with an arbitrary, not SQL-compliant order of operations, NHibernate.Linq doesn't allow it (with NotSupportedException) as it is hard to translate such query into SQL.

It may seem that NHibernate could optimize and reorder operations in the resulting query itself, as calling NHibernate.Linq operators is only composing the query, not running it step by step. Well, technically it could, but let's see what would this mean.

Let's assume that we have the following rivers in the database:

  • Oder - 854 km
  • Vistula - 1047 km
  • Elbe - 1165 km
  • Danube - 2845 km

Relying on NHibernate's hypothetical ability to apply the operators in SQL-compliant way, we could try to get two longest rivers from the database using any of these two queries:

Session.Query<River>().Take(2).OrderByDescending(x => x.Length);
Session.Query<River>().OrderByDescending(x => x.Length).Take(2);

But let's see the results of these queries when run against in-memory objects collection using plain LINQ to Objects:

rivers.Take(2).OrderByDescending(x => x.Length);
  1. Vistula - 1047 km
  2. Oder - 854 km
rivers.OrderByDescending(x => x.Length).Take(2);
  1. Danube - 2845 km
  2. Elbe - 1165 km

So order does matter and it has to matter in every LINQ implementation. That's why it makes perfect sense for NHibernate to throw NotSupportedException when the ordering is not SQL-compliant.

If you really intend to apply ordering after two first objects are fetched, you should fetch it unordered from the database and order it in-memory, falling back to plain LINQ to Objects. LINQ's AsEnumerable() is designed especially for cases like this - it switches the context from database to plain objects.

Session.Query<River>()
.Take(2)
.AsEnumerable()
.OrderByDescending(x => x.Length);

In the next post, we'll see are there any exceptions from the rule above and what are the order rules for non-standard NHibernate.Linq operators.

Sunday, October 30, 2011

LINQ Pitfalls: Casting

One thing that needs to be always remembered when writing NHibernate.Linq queries is that it is going to be translated into SQL eventually. What this means is that we can't do everything in our Select or Where conditions - we are restricted by the capabilities of underlying database and SQL language itself.

Let's say our rivers have delta type stored in the database as plain string and in our domain model we introduced an enum type for convenience:

public class River
{
// ...
public virtual string DeltaType { get; set; }
}

public enum DeltaType
{
None,
WaveDominated,
TideDominated,
Gilbert,
Estuary
}

Now we're going to issue a query that should return all the rivers that has one of the delta types given:

repository.FetchByDeltaTypes(new[] { DeltaType.Estuary, DeltaType.Gilbert });

The first try can look like this:

public IEnumerable<River> FetchByDeltaTypes(IEnumerable<DeltaType> types)
{
IEnumerable<River> data;
using (var tx = _session.BeginTransaction())
{
data = _session.Query<River>()
.Where(r => types.Contains(
(DeltaType)Enum.Parse(typeof(DeltaType), r.DeltaType)
)).ToList();

tx.Commit();
}

return data;
}

Here I'm trying to convert string from entity into an enum value for comparison. When tried to run it, I've got NotSupportedException with quite reticent, but pretty clear message "System.Object Parse(System.Type, System.String)". NHibernate complains here that it can't translate Parse method call into corresponding SQL statement. Well, I can't think of that translation either - SQL doesn't know our enum types etc.

We have to think of another solution - we have to provide the database with strings that it can understand. What we can do here is to change all the enum values into strings first and then pass it to the query:

public IEnumerable<River> FetchByDeltaTypes(IEnumerable<DeltaType> types)
{
IEnumerable<River> data;
using (var tx = _session.BeginTransaction())
{
var typesStrings = types.Select(x => x.ToString()).ToList();

data = _session.Query<River>()
.Where(r => typesStrings.Contains(r.DeltaType))
.ToList();

tx.Commit();
}

return data;
}

No big difference for our code, big difference for NHibernate. It is now trivial to translate this Where clause into SQL. Conclusion - it is always important to think how the query is going to be translated into SQL.

Tuesday, October 25, 2011

LINQ Pitfalls: Nested queries

NHibernate's LINQ provider is very easy to start with and sufficient for majority of use cases, but when used in more complex scenarios it can mislead and surprise. Here is one of the pitfalls (or perhaps bugs?) I've recently run into. Note that I'm using NHibernate 3.0 and the issue is already fixed, but general conclusions possibly still apply.

I have simple model with two-step relationship: River has many Countries, Country has one Continent. I want to query for all rivers located within given continent. For code simplicity, let's try to use two separate LINQ queries:

var countries = session.Query<Country>()
.Where(c => c.Continent.Id == continentId);

var data = session.Query<River>()
.Where(r => (r.Countries.Where(c => countries.Contains(c))).Any())
.ToList();

The Where condition in second query is satisfied if at least one of river's countries exists in the country list for given continent. I don't call ToList() on first query, so I assume it is not run separately but merged (nested) into second one (or at least NotSupportedException is thrown or something). Here is the query that actually runs:

What do we have here? It gets all rivers, filter it for rivers that are within countries (country2_ alias), that are equal to ANY country (country3_ alias) - no continent condition at all! The query is merged, it runs without errors and silently produces unexpected, wrong results.

How can we enforce LINQ provider to filter the results correctly? The easiest way is to materialize the list of countries in separate query to avoid buggy query merging. So let's add ToList() to the first query:

var countries = session.Query<Country>()
.Where(c => c.Continent.Id == continentId)
.ToList();

var data = session.Query<River>()
.Where(r => (r.Countries.Where(c => countries.Contains(c))).Any())
.ToList();

This is how these queries look like in SQL:

This time we are fetching the list of countries first and the second query is correctly filtered using IN clause. But this is not the best solution, for sure. There are two queries instead of one and we don't really need country values, especially if there may be thousands of rows in real-life model. In this case, we should try to rewrite the queries to express our filtering logic in single query. In River-Country-Continent example, it will look like this:

data = session.Query<River>()
.Where(r => r.Countries.Any(c => c.Continent.Id == continentId))
.ToList();

This time the SQL statement contains continent filter and results are as expected:

Conclusion: Use the latest stable versions of your software. Be careful when merging LINQ queries in NHibernate. Ensure all ToList's are in place or find different way to express your filtering. And always check what SQL is your query producing!

Saturday, October 22, 2011

FluentNHibernate: How to globally set id's of all entities as assigned?

Sometimes you have a model in which all entities identifiers are controlled externally from your database - i.e. they are from external systems or you have some kind of sequence generator running separately from the database. In this case, when using Fluent NHibernate, we can turn off generating identity values in database in every mapping using:

Id(x => x.Id).GeneratedBy.Assigned();

But for larger models this may be a bit repetitive. And again, this is where conventions are to help. We can implement very simple convention, that will cover all our entities:

public class AssignedIdConvention : IIdConvention
{
public void Apply(IIdentityInstance instance)
{
instance.GeneratedBy.Assigned();
}
}

And then register it in Fluent NHibernate configuration like this:

Fluently.Configure()
.Mappings(...)
.Conventions.Add<AssignedIdConvention>()

That's all!

But be advised that in general, managing identity values on our own is not the best practice and there is only one use case I can think of, when it is necessary - when we use external data sources (i.e. when our database contains data imported from other databases, like global customers database or authorization system). Note that using natural keys, like Tax Identification Numbers or similiar, can be misleading, as the real world can surprise us in many ways and we'll have no way to handle user without that number or two users with duplicate numbers (and this may happen for sure).

Wednesday, October 19, 2011

Preview of method return value in VS Debugger

I'm probably not going to invent a wheel here, but this is something I'm quite satisfied to figure out. One of the things that annoyed me when using Visual Studio Debugger was that I coudn't see the value that will be returned from a method I'm currently debugging, i.e.:

public int Test()
{
return GetSomeValue();
}

When in line 3, I would expect debugger to allow me to see the value returned by GetSomeValue() method as it certainly knows it. I used to modify the code so that I have a temporary variable for the value or repeat the method call in immediate window. Both ways are quite bizzare.

It turns out that VS already can show the value to me pretty easily using Quick Watch window. I just need to select the expression I want to check and press Ctrl+Alt+Q (or choose Quick Watch from context menu).

Note that it is not exactly the preview of the value, that was returned by GetValue(). It is actually evaluating the selected expression, so be careful when the method being debugged has side effects. See this example:

public class DebugViewTest
{
private int counter;

public int Test()
{
return GetValueWithSideEffect();
}

private int GetValueWithSideEffect()
{
counter++;
return counter;
}
}

When checking the value returned from the GetValueWithSideEffect() method, we'll encounter the different value than normal execution path encounters, as Quick Watch will execute the method's code and increment the counter. Anyway, this method will be sufficient for our debugging in most cases.

Tuesday, October 18, 2011

Mapping collection of strings as nvarchar(max) using FluentNHibernate

The main feature of FluentNHibernate that seems to simplify user experience with NHibernate is its easily chainable, fluent API. But when we want something more complex, we may find it quite surprising.

One of the most common examples of non-default mapping of a property is when we need to change the column size for string property. By default, string properties are mapped tonvarchar(255) or so. We very often need to handle longer strings, too. For single string property, the way to do it is quite strange, but easy:

Map(x => x.Name).Length(4001);

I use SQL Server's magic number 4001 here - column can't be that long and column generated in the database is actually nvarchar(max). The situation gets complicated when we have a collection of strings and we want to keep those strings in the table with nvarchar(max) column named Value. It seems it should go like this:

HasMany(x => x.Names).Element("Value", x => x.Length(4001));

Second lambda parameter in Element method is to provide custom element mapping - sounds like exactly what we want. But no - it has no effect and generated schema still contains column of default length.

To achieve this pretty simple task, I needed to map it like this:

HasMany(x => x.Names).Element("Value", x => x.Columns.Single().Length = 4001);

So this is where FluentNHibernate's fluent API ends. I'm not an XML configuration fan, but using XML it will be obvious how to do this mapping. In Fluent though, it took me some time to figure it out.

NOther welcome post

Hi. Welcome to yet another developer blog. We'll see how it looks like - I don't know yet. I came from web development world, now working mainly on ASP.NET MVC applications. My general interests are somewhere in between applications design & architecture. More specifically, recently I am spending quite a lot of time enjoying or cursing NHibernate and that is probably the topic that we'll start with. Hope we all enjoy it!