Tuesday, May 8, 2012

Migrating from identity to HiLo

Generally, NHibernate is not the best solution when our application is concerned mainly around batch data loads. But there are a lot of scenarios, like initialization, when medium-sized batch inserts make sense in every application.

If our database table primary key is generated with identity generator and we try to persist objects one by one in a loop, our performance can hurt and NHProf starts to complain that we're doing too many database calls. In fact, for every row inserted, NHibernate needs to do a separate round-trip to the database, because it needs to fetch the identity value generated every time.

The solution is to switch our primary key generation strategy from identity to HiLo. HiLo is composing the identifier from two parts, only one of which comes from the database. This means that when NHibernate knows that part (called high), it can insert a number of rows in a single round-trip.

Assuming the size of the batch is sufficient (less than number of rows to be inserted - let's call it N), the number of round-trips needed to persist the data with NHibernate decreased to 2 (from N with identity).

The problem arises when we already have the database in production and we can't just change the generation strategy in the mapping. First, we need to remove the identity attribute from our Id column, what is not so trivial with SQL Server. Actually it's easier to create new column for the new primary key, rewrite the values and drop the previous one. The second issue with non-empty tables is that NHibernate's HiLo needs to start counting from the current highest identity value + 1, otherwise we'll end up with primary key violation.

Here is the SQL Server script I wrote to cope with these issues. It creates new primary key without identity attribute, drops the previous one after migrating the values, creates HiLo infrastructure for NHibernate and populates it with current production values. Feel free to use it!

sp_rename 'TheTable.Id' , 'Id_Identity'
go

alter table TheTable
    add Id bigint
alter table TheTable
    drop constraint Id_PK
go
    
update TheTable
    set Id = Id_Identity
go

alter table TheTable
    alter column Id bigint not null
go

alter table TheTable
    drop column Id_Identity
alter table TheTable
    add constraint Id_PK primary key(Id)
go

create table HiLo (
    NextHi int primary key
)

insert into HiLo (NextHi) values ((select (max(Id) / 32) + 1 from TheTable))

Note that I needed to specify the size of the batch (max_low) in the last line, in order to calculate the starting NextHi correctly.