Saturday, October 27, 2012

Kindle as a cookbook

I'm a Kindle owner for few months. My primary reason for having Kindle was to read technical books as I'm not keen on reading from the screen and most of them already came in electronic formats. It's working well. Moreover, I've almost completely abandoned reading blogs on the laptop in favor of Kindle, thanks to great tools like Calibre or Send-To-Kindle widgets. I'm also using Kindle to check my e-mail from bedroom etc.

These are really nice additional use cases for the reader. But there is one thing I do with my Kindle I'm most surprised about. We had a lot of recipes printed out, our collection grows quite fast and it was getting hard to find something there. Now, instead of printing the recipe we found on the web, we're just selecting it and sending it to Kindle using two-click Amazon's widget that is configured to deliver the documents directly to my Kindle, in readable and searchable format with proper title. On Kindle, I have a separate collection for recipes, what makes it easy to browse. And then in kitchen, recipes on Kindle are easy to read, search through and navigate. Have a try!

Image borrowed from http://www.abetterbagofgroceries.com/recipes/

Wednesday, October 10, 2012

After DevDay 2012 in Kraków

It's almost a week since DevDay 2012 conference that was held in Kraków, Poland, but I think it's still not too late to share my thoughts about this quite fabulous event. First of all, it was completely free, what at once gives you great price/quality ratio. But even if it had been paid conference, it would be worth paying. Top-class speakers, very interesting topics and ideas and perfect organization. A pity it was one day only.

Let me summarize each session briefly.

1. Scott Hanselman - It's not what you read, it's what you ignore

Scott is for sure one of the best developers amongst showmen and best showmen amongst developers. His session about productivity was absolutely great. Even if he doesn't say anything revolutionary, everything was worth reminding. These rather well-known ideas applied to our developers world with a great sense of humour made a good start. And hey, Scott doesn't suck that much! ;)

2. Mark Rendle - Hidden Complexity: Inside Simple.Data and Simple.Web

First technical session, quite insightful yet interesting and easy to follow. Mark showed some crazy stuff he implemented in his software and I was really impressed how much we can squeeze out of our old friend C#. Finding a real-life example when overriding true operator to return false makes sense was amazing!

3. Sebastien Lambla - HTTP Caching 101

A bit chaotic but funny and practical session about misuses of HTTP caching, with some advice about how to send proper headers and what to look for when dealing with IE failing to conform to the standard. Quote of the session: "[you can go with much cleaner solution XYZ] if you're a purist, but I don't think you are as I know all six of them".

4. Rob Ashton - Javascript sucks and it doesn't matter

Great session about the role of JavaScript in nowadays developers world, exactly hitting the jackpot. Rob discussed the five phases of attitude towards JavaScript - from negation and frustration to acceptance and reconciliation. I'm almost arriving to the end of this path, so I felt this session was exactly about me and my experiences. He also gave some useful advice on testing, JS code structure etc.

5. Martin Mazur - Why you should talk to strangers

Nice session about the fact that we developers are all in the same craft, no matter in which technology we work in and how useful is to look broadly and borrow good ideas and solutions from the "others", even from PHP developers ;)

6. Antek Piechnik - Shipping code

Well-prepared and practical presentation about how the development looks like in Antek's company - and it is organized according to all the current trends. But actually I think the session would better match at career days at some university. It stood out a bit from the other sessions, but still was not bad.

7. Greg Young - How to get productive in a project in 24h

Interesting session about code metrics in practise. Greg shown some useful tools to quickly measure things in the project in order to identify most smelly areas. He calls for using version control system as a data source to be inquired for a lot of useful data. Very interesting conclusion of the very well-spent day!

What to add more? Looking forward for the next year!

Monday, October 1, 2012

NoSQL Distilled - a small review

NoSQL DistilledI've just finished reading NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence, a fresh book by Pramod J. Sadalage and Martin Fowler and that's a good opportunity to do my first short book review. I don't have real NoSQL experience other than few hours of experiments with RavenDB, one Ayende's presentation and few posts on his blog, but it's enough to have some basic understanding of the topic. So I took the paperback dated 2013 with a lot of enthusiasm and curiosity.

The book is divided into two parts. The first consists some background shown from more theoretical yet readable perspective, different than on Ayende's blog, but still far from dry academic papers. The part covers concepts such as scaling or sharding and mentions CAP theorem. It also provides quite a good map-reduce idea explanation, but few more different examples will surely make it even easier to understand. This background information is not a must to start early experiments with NoSQL databases, but can be really useful for more DBA-minded programmers to become more familiar with the ideas.

The second part of the book, called "Implement", looks through the different NoSQL paradigms. Unfortunately, authors haven't avoided some repetitions from the first chapter, what is not really needed for such a thin book, I think. The second half compares key-value stores, document databases, column-family stores and graph databases using common criteria like terminology, scaling possibilities, query features, availability, support for transactions etc. There is also a strong emphasis on discussing the capabilities for operating in clusters. Some implementations for each NoSQL approach are mentioned, few pros and cons discussed, but the book doesn't fulfill a role of quick start guide for these databases - in order to implement even something very simple in each technology, one have to seek for another sources.

Although I was not extremely delighted with the book, I find it worth reading even only for the final thoughts and overall remarks. On the one hand, a trendy buzzword schema-free was demystified. The authors stressed that there is always a schema somewhere as the clients need to understand the data somehow and NoSQL solutions are just moving the responsibility of keeping the data definition from server side to the clients and this has some non-negligible consequences e.g. for data migrations. On the other hand, they are urging to use the proper tools for given scenario and convincing that RDBMS is not the one and only tool for everything. Even the book's subtitle mentions "polyglot persistence" - the concept that a single application can benefit from having multiple underlying data storing technologies, each of them designed exactly for the job it's doing.

And the final answer for the question "whether to use NoSQL databases" is as always: "it depends".

Tuesday, September 25, 2012

Keeping server and client side in sync - enforcing contract compatibility

In our project which is an ASP.NET MVC application, quite a lot of features are implemented at the client side, with JavaScript, which is talking with the server using JSONs. Besides the web client, we have a mobile client application, which is driven by the separate API, provided by the same MVC application, a bit WebAPI style, again using JSONs. No rocket science.

As all of these three parts are growing and changing quickly (sometimes way too quickly), we were struggling with incompatibility between what the JavaScript or mobile client expects to receive and what the server is returning, due to the changes not being applied on both sides. Refactorings, functional changes or even correcting a typo affected these server-client interactions. Changing a property name is so easy with ReSharper that we're often not taking enough attention to spot possible impact at the borders between the layers. What's worse, we're relying heavily on the default model binding behavior of ASP.NET MVC - this means that even parameter names in action methods became part of our public API that we need to take care of. And by taking care I mean either not ever changing once-published name, or update all the possible clients together with the server-side backend deployment (that includes forcing all mobile app users to upgrade - nightmare!).

We decided we need a cheap, reliable and universal method that prevents unintended changes in our public contracts - action names, parameter types and names, returned types structures, names of properties etc.

Good and well maintained suite of integration tests will probably do the job, but unfortunately we still don't have one (and we're not the last ones, are we?). And I suppose the suite would be quite huge to cover good range of unintentional changes we can possibly introduce. Second thought was to implement some quirky tests that would use reflection to go through the codebase and fail if the implementation differs from what we expect. But it would be better to invest that considerable amount of time needed to write these kind of specifications in writing real integration tests.

Finally we took a simpler approach. We decided to be more explicit in what is our external API and what is not. We've created a binary contract definition using few steps. Let's see it on this simple example:

public class StadiumController
{
     public ActionResult GetByCapacity(int? minCapacity, int? maxCapacity)
     {
          var min = minCapacity ?? 0;
          var max = maxCapacity ?? int.MaxValue;
          return Json(Stadiums.Where(x => x.Capacity >= min && x.Capacity <= max));
     }
}

1. enclose the input parameters of action methods in "input model" classes

public class StadiumByCapacityInputModel
{
     public int? MinCapacity { get; set; }
     public int? MaxCapacity { get; set; }
}

public class StadiumController
{
     public ActionResult GetByCapacity(StadiumByCapacityInputModel input)
     {
          var min = input.MinCapacity ?? 0;
          var max = input.MaxCapacity ?? int.MaxValue;
          return Json(Stadiums.Where(x => x.Capacity >= min && x.Capacity <= max));
     }
}

2. change the return types of action methods to "output model" classes

public class StadiumByCapacityInputModel
{
     public int? MinCapacity { get; set; }
     public int? MaxCapacity { get; set; }
}

public class StadiumOutputModel
{
     public string Name { get; set; }
     public int Capacity { get; set; }
}

public class StadiumController
{
     public IEnumerable<StadiumOutputModel> GetByCapacity(StadiumByCapacityInputModel input)
     {
          var min = input.MinCapacity ?? 0;
          var max = input.MaxCapacity ?? int.MaxValue;
          var stadiums = Stadiums.Where(x => x.Capacity >= min && x.Capacity <= max);

          return stadiums.Select(x => new StadiumOutputModel() 
          {
               Name = x.Name,
               Capacity = x.Capacity
          };
     }
}

3. extract interfaces from the controllers

public class StadiumByCapacityInputModel
{
     public int? MinCapacity { get; set; }
     public int? MaxCapacity { get; set; }
}

public class StadiumOutputModel
{
     public string Name { get; set; }
     public int Capacity { get; set; }
}

public interface IStadium
{
     IEnumerable<StadiumOutputModel> GetByCapacity(StadiumByCapacityInputModel input);
}

public class StadiumController : IStadium
{
     public IEnumerable<StadiumOutputModel> GetByCapacity(StadiumByCapacityInputModel input)
     {
          var min = input.MinCapacity ?? 0;
          var max = input.MaxCapacity ?? int.MaxValue;
          var stadiums = Stadiums.Where(x => x.Capacity >= min && x.Capacity <= max);

          return stadiums.Select(x => new StadiumOutputModel() 
          {
               Name = x.Name,
               Capacity = x.Capacity
          };
     }
}

4. move these interfaces and input/output models far from the controller so that ReSharper-driven refactorings do not affect it - to the separate "contract" libraries.

5. include the libraries as referenced DLLs in our project

6. tweak ASP.NET MVC's default ActionInvoker to handle non-ActionResult return types (not needed with ASP.NET Web API - the actions in Web API controllers by design return POCO objects)

Now we treat the contract libraries like separate projects. We actually keep them in Libs folder and check it in to our main project as binaries, but just having it in a separate solution will do the job. The solution for contract library is configured so that the built output files go directly into the Libs folder, what is not possible without manual check-out of the previous binaries. This guarantees that no one checks in any changes in contract code without the new contract binaries and it also raises the level of explicitness. We've effectively made the development around contracts more difficult to ensure that all the changes done in contracts definitions are made deliberately and with proper consideration.

Whenever someone breaks the contract requirements (without modifying the binary contract properly), the project just doesn't compile - either the interface is not implemented or there is some kind of type mismatch. Moreover, having the contract definition in a separate physical project makes managing, documenting or versioning easier.

There is one thing that may seem to be a serious downside. We need to map our input model classes from contract to some "real" domain objects from the main codebase in order to use it. And the same with return types - we often need to map domain objects back to match types defined in the contract. It is a lot more fuss, but again - it makes the contract very explicit and visible. Easy cases of mapping can be handled with the tools like AutoMapper. More complicated cases may exist when the codebase starts to differ from the contract and we need to keep backward compatibility (like when the clients are mobile apps). In that cases again - it's even better to have all the transformations explicit and in one place and the mapping code becomes more helpful than annoying.

Saturday, September 22, 2012

Non-ActionResult action return type in ASP.NET MVC

In ASP.NET MVC, there's quite silly behavior when the controller's action method returns type that is not ActionResult-derived. The default ActionInvoker, which is responsible for invoking action code and interpret its result, checks if the returned instance is ActionResult and if not, returns plain string representation of the object (type name by default):

protected virtual ActionResult CreateActionResult(
    ControllerContext controllerContext, ActionDescriptor actionDescriptor, object actionReturnValue)
{
    if (actionReturnValue == null)
    {
        return new EmptyResult();
    }

    ActionResult actionResult = (actionReturnValue as ActionResult) ??
        new ContentResult { Content = Convert.ToString(actionReturnValue, CultureInfo.InvariantCulture) };
    return actionResult;
}

I can see no real-life scenario in which ToString result returned as plain content can be useful. This in practice means that in ASP.NET MVC we're forced to use ActionResult or its derived types. This is especially annoying when you want your action method to be defined in an interface or used somewhere as a delegate.

The issue was solved much better in ASP.NET Web API - the actions in Web API controllers by design return POCO objects that are serialized correctly before sending it to the wire depending on the request and configuration - as XML, JSON etc.

To achieve similiar result in "normal" MVC controllers, let's replace the default ControllerActionInvoker right after creating the controller - in ControllerFactory - with our derived implementation that just overrides the virtual CreateActionResult method:

public class MyControllerFactory : DefaultControllerFactory
{
    public override IController CreateController(RequestContext context, string controllerName)
    {
        var controller = base.CreateController(context, controllerName);
        return ReplaceActionInvoker(controller);
    }

    private IController ReplaceActionInvoker(IController controller)
    {
        var mvcController = controller as Controller;
        if (mvcController != null)
            mvcController.ActionInvoker = new ControllerActionInvokerWithDefaultJsonResult();
        return controller;
    }
}

public class ControllerActionInvokerWithDefaultJsonResult : ControllerActionInvoker
{
    public const string JsonContentType = "application/json";

    protected override ActionResult CreateActionResult(
        ControllerContext controllerContext, ActionDescriptor actionDescriptor, object actionReturnValue)
    {
        if (actionReturnValue == null)
            return new EmptyResult();

        return (actionReturnValue as ActionResult) ?? new ContentResult()
        {
            ContentType = JsonContentType,
            Content = JsonConvert.SerializeObject(actionReturnValue)
        };
    }
}

This simple implementation just serializes the returned objects to JSON, but it's easy to implement something more sophisticated here, like content negotiation patterns like Web API has. Feel free to use it and extend it if you find it useful - I've published it as a Gist for your convenience.

Saturday, September 15, 2012

NHibernate LINQ Pitfalls: Too many joins with deep conditions

Although I've just discussed whether NHibernate became obsolete, it doesn't mean that I'm no longer maintaining or developing applications that use it. It'll take at least few years to completely phase it out and in the meantime we still have some problems with it and we still need to know how to use it.

One of recent surprises we had with NHibernate was when querying the database using LINQ provider and condition in our query was checking a reference value not directly in queried object, but in another object it references (yes, I know it is breaking the Law of Demeter), like this:

var firstQuery = sess.Query<RootNode>()
    .Where(x => x.Child.GrandChild.Id == 42)
    .FirstOrDefault();

The condition on GrandChild uses its key value only, so looking at the database tables, joining the GrandChildNode is not needed - all the information used by this query sits in RootNode. Surprisingly, NHibernate 3.2 not only joins GrandChildNode, but also joins RootNode for the second time, only to completely ignore it. That makes 4 tables total.

However, when we change the way we're looking for a grand child and use proxy object created by ISession's Load method, we get expected and optimal query with only 2 tables joined.

var secondQuery = sess.Query<RootNode>()
    .Where(x => x.Child.GrandChild == sess.Load(42))
    .FirstOrDefault();

This bug was already found and is fixed in version 3.3 (and surprisingly, was not present in 3.1) - so it affects only NHibernate 3.2. But I think it's worth mentioning as it may have potentially large performance impact if you're using that version.

Friday, September 7, 2012

Is NHibernate dead?

Before discussing the question from the title, let me answer another one: Is this blog dead? Definitely no. Summer time distracted me a bit, but I'm hoping to get back to writing now :)

So it's more than half a year since I've concluded my series about NHibernate's mapping-by-code. The series is still surprisingly popular, there are quite a lot of hits from Google every day. I've also just reached 50 upvotes at Stack Overflow in a question about where to find some docs and examples for mapping-by-code. Thanks for this!

Quick googling for "mapping by code" and skimming through NHForge website convinced me that still there is nothing better available in the topic. Moreover, none of the bugs I've encountered half a year ago made any progress - all issues left unresolved and unassigned back then are in the same state right now. These facts are a bit sad, as I saw the mapping-by-code feature as quite revolutionary and shaping the future of NHibernate.

Well, and here comes the question - maybe there is no future? Maybe everything what is needed in the subject of object-relational mapping is already there and no development is needed? Ohloh stats notice some development in NHibernate project, but the pace is rather slowing down. No new releases are planned according to the roadmap on issue tracker. There are 25 issues classified as "Critical" unresolved, oldest waiting for more than 20 months by now. The development in a third-party ecosystem has already stopped - see the Ohloh graphs for NHibernate.Contrib or Fluent NHibernate, to name the most significant ones.

In my opinion, the reason for NHibernate's agony is simple. It was already discovered many times that applications nowadays are mostly web-based, read-intensive, not so data-centric and not consisting complicated data manipulation as few years ago. With the advent of mature NoSQL engines - free, easy to use and full of neat features - like RavenDB and - on the other side - with lightweight ORM-like tools like Dapper or Simple.Data, that cover at least 95% of ORM features needed to effectively handle newly-designed relational databases, we just don't need to use such a big and heavy tool like NHibernate.

Legacy databases are still a niche for NHibernate, for sure, but how many legacy databases that are not OR-mapped yet we still have out there? And for fresh developments, I'd say that unless you're designing some kind of specific data-driven application, it is more effective (both in terms of development effort and performance) to stick with either NoSQL or some lightweight ORM instead of NHibernate.

NHibernate is a great tool, but time goes by pretty quickly. The context of our work changes from year to year and even good tools some day must be superseded by better ones, that are more suitable for nowadays needs. And I think that day for heavy, multi-purpose ORM's like NHibernate has just come.

Tuesday, May 8, 2012

Migrating from identity to HiLo

Generally, NHibernate is not the best solution when our application is concerned mainly around batch data loads. But there are a lot of scenarios, like initialization, when medium-sized batch inserts make sense in every application.

If our database table primary key is generated with identity generator and we try to persist objects one by one in a loop, our performance can hurt and NHProf starts to complain that we're doing too many database calls. In fact, for every row inserted, NHibernate needs to do a separate round-trip to the database, because it needs to fetch the identity value generated every time.

The solution is to switch our primary key generation strategy from identity to HiLo. HiLo is composing the identifier from two parts, only one of which comes from the database. This means that when NHibernate knows that part (called high), it can insert a number of rows in a single round-trip.

Assuming the size of the batch is sufficient (less than number of rows to be inserted - let's call it N), the number of round-trips needed to persist the data with NHibernate decreased to 2 (from N with identity).

The problem arises when we already have the database in production and we can't just change the generation strategy in the mapping. First, we need to remove the identity attribute from our Id column, what is not so trivial with SQL Server. Actually it's easier to create new column for the new primary key, rewrite the values and drop the previous one. The second issue with non-empty tables is that NHibernate's HiLo needs to start counting from the current highest identity value + 1, otherwise we'll end up with primary key violation.

Here is the SQL Server script I wrote to cope with these issues. It creates new primary key without identity attribute, drops the previous one after migrating the values, creates HiLo infrastructure for NHibernate and populates it with current production values. Feel free to use it!

sp_rename 'TheTable.Id' , 'Id_Identity'
go

alter table TheTable
    add Id bigint
alter table TheTable
    drop constraint Id_PK
go
    
update TheTable
    set Id = Id_Identity
go

alter table TheTable
    alter column Id bigint not null
go

alter table TheTable
    drop column Id_Identity
alter table TheTable
    add constraint Id_PK primary key(Id)
go

create table HiLo (
    NextHi int primary key
)

insert into HiLo (NextHi) values ((select (max(Id) / 32) + 1 from TheTable))

Note that I needed to specify the size of the batch (max_low) in the last line, in order to calculate the starting NextHi correctly.

Sunday, April 15, 2012

NHibernate's inverse - what does it really mean?

NHibernate's concept of 'inverse' in relationships is probably the most often discussed and misunderstood mapping feature. When I was learning NHibernate, it took me some time to move from "I know where should I put 'inverse' and what then happens" to "I know why do I need 'inverse' here and there at all". Also now, whenever I'm trying to explain inverses to somebody, I find it pretty hard.

There are a lot of explainations over the net, but I'd like to have my own one. I don't think that the others are wrong, it'll just help me arrange my own understanding and if anyone else take advantage of this, that's great.

Where do we use inverse?

First, some widely-known facts, next we'll elaborate on few of them.

  • Inverse is a boolean attribute that can be put on the collection mappings, regardless of collection's role (i.e. within one-to-many, many-to-many etc.), and on join mapping.
  • We can't put inverse on other relation types, like many-to-one or one-to-one.
  • By default, inverse is set to false.
  • Inverse makes little sense for unidirectional relationships, it is to be used only for bidirectional ones.
  • General recommendation is to use inverse="true" on exactly one side of each bidirectional relationship.
  • When we don't set inverse, NHProf will complain about superfluous updates.

What does it mean for a collection to be 'inverse'?

The main problem in understanding 'inverse' is it's negating nature. We're not used to setting something up in order to NOT take an action. Inverse set to true means "I do NOT maintain this relationship". Hence, inverse set to false means "I DO maintain this relationship".

It'll be much more understandable if we could go to the opposite side of the relationship and be positive there: "This side maintains the relationship" and NHibernate would automatically know that the other side doesn't (*). But it is implemented as it is - we have to live with inverse's negative character.

Each relationship is represented in the database as an identifier of a related table row in the foreign key column at 'many' side. Why at 'many' side? Because that's how we do relationships in the relational databases. The column "holding" the association is always at 'many' side. It's not possible to keep the association at 'one' side because we'd have to insert many values into one database field somehow.

So what does it mean for a collection in NHibernate to maintain the relationship (inverse="false")? It means to ensure that the relation is correctly represented in the database. If the Comments collection in the Post object is responsible for maintaining the relationship, it has to make sure all its elements (comments) have foreign keys set to post's id. In order to do that, it issues a SQL UPDATE statement for each Comment, updating its Post reference. It works, the relationship is persisted correctly, but these updates often do not change anything and can be skipped (for performance reasons).

Inverse="true" on a collection means that it should not take care whether the foreign keys in the database are properly set. It just assumes that some other party will take care of it. What do we gain? We have no superfluous UPDATE statements. What can we lose? We have to be sure that the second side actually takes over the responsibility of maintaining the association. If it doesn't, nobody will and we'll be surprised that our relationship is not persisted at all (NHibernate will not throw an error or so, it won't guess that it's not what we've expected).

When should we set inverse="true"?

Let's consider one-to-many first. Our relationship must be bidirectional and have entities (not value types) at both sides for inverse to make sense. Other side ('many' side) is always active, we can't set inverse on many-to-one. This means that we should put inverse="true" on the collection, provided that:

  • our collection is not explicitly ordered (like <list>) - it is i.e. <bag> or <set>; ordered lists have to be active in order to maintain the ordering correctly; 'many' side doesn't know anything about the ordering of collection at 'one' side
  • we actually set the relationship at 'many' side correctly

Consider the example:

public class Post
{
public virtual int Id { get; set; }
public virtual ICollection<Comment> Comments { get; set; }
}

public class Comment
{
public virtual int Id { get; set; }
public virtual Post Post { get; set; }
public virtual string Text { get; set; }
}

// ...

var comment = new Comment() { Text = "the comment" };
session.Persist(comment);
post.Comments.Add(comment);

We are not setting Post property in Comment class as we may expect NHibernate will handle that as we append our comment to the collection of comments in particular Post object (**). If the post.Comments collection is not inverse, it will actually happen, but quite ineffectively:

We've inserted null reference first (exactly as it was in our code) and then, as the collection is responsible for maintaining the relationship (inverse="false"), the relationship was corrected by separate UPDATE statement. Moreover, in case we have not null constraint on Comment.Post_id (which is actually good), we'll end up with exception that we can't insert null foreign key value.

Let's see what happens with inverse="true":

There's no error, but the comment is actually not connected to the post, despite we've added it to a proper collection. But using inverse, we've explicitly turned off maintaining the relationship by that collection. And as we don't set the relationship on Comment side, noone does.

The solution of course is to explicitly set comment's Post property. It is good from object model perspective, too, as it reduces the amount of magic in our code - what we've set is set, what we haven't set is not set magically.

var comment = new Comment() { Text = "the comment", Post = post };
session.Persist(comment);
post.Comments.Add(comment);

Much better now:

Time for many-to-many. Again, inverse makes sense only when we've mapped both sides. We have to choose one side which is active and mark the second one as inverse="true". Without that, when both collections are active, both try to insert a tuple to an intermediate table many-to-many needs. Having duplicated tuples makes no sense in most cases. For some suggestions how to choose which side is better in being active, see my post from December.

To sum up

Left sideRight sideInverse?
one-to-manynot mappedmakes no sense - left side must be active
one-to-manymany-to-oneright side should be active (left with inverse="true"), to save on UPDATEs
(unless left side is explicitly ordered)
many-to-manynot mappedmakes no sense - left side must be active
many-to-manymany-to-manyone side should be active (inverse="false"), the other should not (inverse="true")

______

(*) There are of course reasons why NHibernate doesn't do assumptions about other sides of relationships like that. The first one is to maintain independence between mappings - it will be cumbersome if change in mapping A modifies the B behaviour. The second one are ordered collections, like List. The ordering can be automatically kept by NHibernate only when collection side is active (inverse="false"). If the notion of being active is managed on the other side only, changing the collection type from non-ordered to ordered would require changes in both mappings.

(**) Note that inverse is completely independent from cascading. We can have cascade save on collection and it does not affect which side is responsible for managing the relationship. Cascade save means only that when persisting Post object, we're also persisting all Comments that were added to the collection. They are inserted with null Post value and UPDATEd later or inserted with proper value in single INSERT, depending on object state and inverse setting, as described above.

Thursday, April 5, 2012

Strongly typed links within ASP.NET MVC areas

Recently we've started to utilize concept of areas in our ASP.NET MVC application to separate different products provided by our application. We are going to have some controllers with the same names in different areas, so when linking, we'll need to specify the area name (if different than the current request's one). But we're used to strongly-typed url generation using extensions from MVC Futures like Html.ActionLink<T> with lambdas (Html.ActionLink<HomeController>(x => x.About(), "Home") etc. Unfortunately, these two requirements don't work well together.

MVC Futures extensions (known also as Microsoft.Web.Mvc) are good at getting the controller and action name from provided controller type and action lambda, but they don't get the area correctly. It's probably because there's no 100% correct way to determine in which area the controller lies. In most cases, we could guess that from the namespace - when creating area within Visual Studio, it creates the directory for controllers under Areas.AreaName.Controllers. But that's just a convention and there's no guarantee that it's always followed.

MVC Futures offers a solution - we can mark our controllers within areas with an attribute:

[ActionLinkArea("First")]
public class BillingController : Controller
{
}

This is understood by MVC Futures' strongly-typed helpers and when building a link to BillingController they'll use "First" area correctly.

Unfortunately, our requirements were more complicated. We have another area we use to expose some of our controllers through the RESTful API. And linking rules are as follows:

  • when current request is within First or Second area, we're linking as described above - target area is determined by target controller
  • but, when current request is within API area, we should link to API alternative controller (if available).

My first idea was to inherit from ActionLinkAreaAttribute and override the target area name for API calls, but unfortunately the attribute class is sealed. It means that we can't make use of that standard behavior and need to create our own.

After some fiddling with source code I've written my own versions of helper methods I need. My implementations conform to my own attribute, which allows to set up the default area name and fall back to standard MVC behavior (staying in current area) for API calls. Here's how to use it:

[LinkWithinArea("First", OrSwitchTo = "Api")]
public class BillingController : Controller
{
}

Now, whenever the helper method is building a link typed with BillingController, it'll generate link to Api area for calls from Api area or to First area for all other calls. OrSwitchTo parameter is optional - when omitted, LinkWithinArea will behave just like the built-in ActionLinkArea. No need to specify area all the time when building a links.

I've published the attribute and helpers code as a Gist, feel free to use it.

Tuesday, April 3, 2012

Table per subclass using a discriminator with mapping-by-code

Recently xanatos in a comment to one of my mapping-by-code series post asked how to implement hybrid-mode inheritance with both table per subclass and discriminator columns using mapping-by-code. I think this scenario is quite exotic (why do we need a discriminator column if we have separate tables?), but the documentation explicitly mentions this possibility, so it should be possible with mapping-by-code, too.

Here is the expected XML mapping fragment:

<class name="Payment" table="PAYMENT">
<id name="Id" type="Int64" column="PAYMENT_ID">
<generator class="native"/>
</id>
<discriminator column="PAYMENT_TYPE" type="string"/>
<property name="Amount" column="AMOUNT"/>
...
<subclass name="CreditCardPayment" discriminator-value="CREDIT">
<join table="CREDIT_PAYMENT">
<key column="PAYMENT_ID"/>
<property name="CreditCardType" column="CCTYPE"/>
...
</join>
</subclass>
</class>

And here is how to do it in mapping-by-code:

public class PaymentMap : ClassMapping<Payment>
{
public PaymentMap()
{
Id(x => x.Id, m => m.Generator(Generators.Native));
Discriminator(d => d.Column("PaymentType"));
Property(x => x.Amount);
}
}

public class CreditCardPaymentMap : SubclassMapping<CreditCardPayment>
{
public CreditCardPaymentMap()
{
DiscriminatorValue("CREDIT");
Join("CreditPayment", j => j.Property(x => x.CreditCardType));
}
}

I'm impressed again how easily XML mapping can be translated to mapping-by-code syntax.

Saturday, March 31, 2012

ASP.NET MVC and overlapping routes

ASP.NET routing in MVC allows us to define how different URLs are mapped to the controllers, actions, action parameters etc. It is quite simple and in some cases it seems even too simple. In our MVC application we had a requirement that we should accept these two path patterns:

{controller}/{action}
{controller}.aspx/{action}

The first one is the default, we want our generated links to go this route. The second one is legacy, but it's required to work correctly. We could've set up some kind of redirects from old route to the default one, but we've thought it'll be easier to define a separate route in our application that will map to the same controllers as the default route.

Easier said than done. The first attempt looked like that:

routes.MapRoute("Default", "{controller}/{action}", 
new { controller = "Home", action = "Index" });
routes.MapRoute("Legacy", "{controller}.aspx/{action}",
new { controller = "Home", action = "Index" });

Seems trivial, but doesn't work. When requested Home.aspx, the application failed to find the controller named "Home.aspx". Well, the default route eagerly matched the controller variable and missed the fact that the second route seems to be better. Now I remember, the docs clearly state that finding the route stops on first match and we should arrange our routes from most specific to most generic ones.

OK, let's then change the order of our routes, so that we'll catch legacy ones first:

routes.MapRoute("Legacy", "{controller}.aspx/{action}", 
new { controller = "Home", action = "Index" });
routes.MapRoute("Default", "{controller}/{action}",
new { controller = "Home", action = "Index" });

Looks like working, both "Home.aspx" and "Home" are mapped to HomeController. But now all our links generated by MVC helpers (like Html.ActionLink) have .aspx extension. We don't want to expose this route as it is for backwards compatibility only. I've found the explanation and the insight I needed in Craig Stuntz's article. Generally, when building an URL, the helpers' behavior is similiar to parsing scenario. The first route that can be satisfied with route values given is chosen. We're passing controller and action values to Html.ActionLink, so the first route matches.

But Craig's article got me on the right track:

routes.MapRoute("Legacy", "{controller}.{extension}/{action}", 
new { controller = "Home", action = "Index" }, new { extension = "aspx" });
routes.MapRoute("Default", "{controller}/{action}",
new { controller = "Home", action = "Index" });

I've modified the first, legacy route, introduced the extension variable in place of "aspx" and defined a constraint in the fourth parameter, stating that extension variable have to be equal to aspx. This way only URLs like Home.aspx match the first route and ActionLink helpers don't use it (unless a route value named extension and equal to "aspx" is passed).

Tuesday, March 20, 2012

HTTP protocol breaking in ASP.NET MVC

HTTP clients (such as browsers) are designed to handle different error codes differently and there are a lot of reasons why server-side errors have different status codes than those triggered by users. Depending on status code, responses are cached differently, web crawlers are indexing differently and so on.

Recently, durring error handling review in our project, I've learned how ASP.NET MVC obeys HTTP protocol rules in terms of status codes. And unfortunately, there are some pretty easy cases where it doesn't. See this simple controller:

public class TestController : Controller
{
public ActionResult Index(int test, string html)
{
return Content("OK");
}
}

MVC handles missing controllers/actions properly, as 404 Not Found:

Let's now try to call the Index action without parameters:

MVC couldn't bind parameter values to an action and throws an exception, which yields 500 Internal Server Error status code. According to the HTTP protocol, it means that something unexpected happened on the server, but it is server's own problem, not that the request was wrong ("hey, I have some problems at the moment, can't help you, come back later"). But that's not true, I wouldn't say that missing parameter is an unexpected situation, and definitely it's the request what is wrong. The protocol has better solutions for that kind of situations - like 400 Bad Request ("hey, I've tried to help you but you're doing something wrong and I can't understand you").

Another example:

MVC has some validation rules that protects the server from potentially malicious requests, like cross-site scripting. But again, those cases are handled with 500 Internal Server Error, despite that it's obviously the client's fault - again 400 Bad Request will work here better. Purely from the HTTP protocol point of view, 500 Internal Server Error here is like admitting that the malicious request actually broke something on the server.

How can we fix these two? For example by modifying the response generated by MVC on error. We can add this code to our Global.asax.cs:

protected void Application_Error()
{
var lastError = Server.GetLastError();
if (lastError is ArgumentException || lastError is HttpRequestValidationException)
{
Server.ClearError();
Response.StatusCode = (int) HttpStatusCode.BadRequest;
}
}

It checks for type of exception thrown and changes the status code to more appropriate 400 Bad Request in these two cases mentioned.

Sunday, March 11, 2012

Mapping-by-code and custom ID generator class

In the comments to one of my mapping-by-code posts Cod asked if it is possible to specify a custom ID generator class within mapping-by-code mappings. I didn't know the answer but the topic seems to be interesting enough to figure it out.

The answer is of course positive - mapping-by-code API is flexible enough to support that. Let's remind how we normally specify the generator class to be used:

Id(x => x.Id, m =>
{
m.Generator(Generators.Native, g => g.Params(new
{
// generator-specific options
}));
});

The Generator method's first parameter expects an instance of IGeneratorDef class. NHibernate provides a set of predefined ones in Generators static class - see the full list here - but we may provide our own implementation as well.

Let's hook up a custom generator class as implemented in this NHForge's article. FDPSequence class defined there is an integer-based, parametrized generator (implementation of NHibernate's IIdentifierGenerator). To use it within mapping-by-code, we need to prepare IGeneratorDef class accordingly. But that's pretty easy:

public class FDPSequenceDef : IGeneratorDef
{
public string Class
{
get { return typeof(FDPSequence).AssemblyQualifiedName; }
}

public object Params
{
get { return null; }
}

public Type DefaultReturnType
{
get { return typeof(int); }
}

public bool SupportedAsCollectionElementId
{
get { return true; }
}
}

We have to implement 4 properties:

  • Class is an equivalent of class attribute in XML - this is the place where we need to specify our custom generator assembly qualified name.
  • Params allows us to create non-standard <param> elements equivalents. We could return an anonymous object with values set i.e through the constructor - but I don't think it is needed as we can always pass parameters through the second Generator method's parameter, as an anonymous object, too.
  • DefaultReturnType specifies what is the type generated by our custom generator (may be null, NHibernate will figure it out through the reflection later)
  • and SupportedAsCollectionElementId obviously specifies if our generator is usable within collection elements.

Having FDPSequenceDef in place, we just need to pass it to Generator method in mapping-by-code:

Id(x => x.Id, m => m.Generator(new FDPSequenceDef()));

And we're done! The XML generated looks like expected and the generator is working for us:

<id name="Id" type="Int32">
<generator class="NHWorkshop.FDPSequence, NHWorkshop, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null" />
</id>

Tuesday, March 6, 2012

Databases Versioning - Branching and Merging

Recently in the project I work in we've encountered a major database branching issue for the first time. We are using branch for release branching strategy, meaning that we do our current development in the trunk and branch every time the product is released. Our branches are just for fixing critical bugs that can't wait until next release. One of the bug fixes we needed to apply involved schema change and the problem was that we went ahead with development in our trunk so that the bugfix update script differs for production branch and trunk.

We're doing our database versioning with RoundhousE using forward-only, run-once, irreversible update scripts, in case of rollbacks we are restoring our databases from backups. Our tooling ensures that no script is modified after being run, which makes sense as we have no way to apply those changes to the database that is already some revisions ahead. We also don't want to have scripts that are branch-specific, as we'll need to skip this script on merging and we'll need to remember about that until the last day of existence of our product. What's more, if we have our development environment built using different set of scripts than the production one, we are asking for trouble.

Before we decided what to do, we've thoroughly discussed an article by K. Scott Allen from 2008. There were two solutions proposed - either to include the patching script before all the new scripts from the trunk (meaning that the databases already in trunk version need to be fixed somehow) or to have two different scripts in two branches written in such way that the script itself ensures that it is not run twice, so that it can be merged through branches.

I don't like the second option, which was recommended by Scott. It suits to our tooling and will work, but going that way means that production database was built a bit differently than development ones (as our patch script was branched - there are some statements that must have been skipped to make the script run correctly both in prod and dev). That is smelly. Even if we can see that the result seems to be the same, we'd prefer to have all our databases built using exactly the same set of scripts in the same order.

Scott discourages the first option - with inserting the patch script before all the trunk scripts - as it means applying the changes to the database that is already forward. But again - we want our databases to be built using exactly the same set of scripts in the same order. This means that if our production database will have the patch applied before the scripts that are already in trunk (and will go to production in some future release) - we should have the same order in the development databases.

Here is our final solution - it's a bit different than these two:

  1. Integrate the patch before all the trunk scripts. Let's say that the branch was done after script 100 so that the production database is at version 100 and we have new scripts 101 and 102 in the trunk so that our development databases are currently in version 102. This means that our patch needs to go between 100 and 101 - let's say 100a.
  2. Modify 101 and 102 to be runnable on the new schema (changes should not be needed in most cases as 100a is just a bugfix and should not consist major changes as such).
  3. Roll back all the development databases to version 100 from the backup, so that the next upgrade will run 100a first and then 101 and 102. In case someone will not roll back the database, the next local deployment will fail on running 100a script on the database already at ver. 102 and this is good as it requires every developer to have a production-like environment.

The only issue with this approach is that because of restoring the database from the backup we're losing some of the newest data. But this is probably not a big deal in the development environment. And knowing that all our databases (development, production and whatever) were upgraded by the same sequence of statements lets us sleep better.

Friday, March 2, 2012

Loquacious HTML builder based on XSD - NOtherHtml

Previously, we've build a house and an arbitrary XML structure using loquacious API. The next loquacious interface usage I'll share will be more complicated and probably closer to the real-life needs. It'll be an API to build any valid XHTML markup in the code, based on XSD (XML Schema Definition). You can see the result on GitHub, feel free to use and fork it, if you find it useful!

When building the interfaces seen in Action<T> parameters I've strictly followed the rules and names given in XSD. That makes a guarantee that the code produced using my API will always be a valid HTML (XHTML 1.0 Strict in this case) in terms of elements nesting rules. If an element is not available at given level, it means that XSD doesn't allow it there.

I'm going to go over the codebase quickly to show how easily XSD-based loquacious interfaces can be built.

Architecture overview

A root idea in loquacious interfaces is that when going down the structure of our constructed object graph, we need Action<T> lambda typed with an interface exposing all the options available at given level. For XHTML (and XML in general), these levels are elements and available options are its allowed child elements, attributes and inner textual content. So for each XHTML element (for each <xs:element> element in schema) we need an interface - let's call it by prefixing element's name with I and let's leave it empty by now.

public interface IHtml {}
public interface IHead {}
public interface IBody {}
// etc...

That's almost 80 interfaces, quite a lot, but that's what will give us the validity guarantee later. We need to have all these interfaces implemented, too, and that's more scary. Some interfaces will be very similiar as a lot of HTML elements have the same attributes and child elements. If we decide to have separate implementation for each interface, we'll have a massive code duplication.

I've decided to do something different - implement all elements' interfaces in single class - ElementImpl. At the end, it will have a method for each element and each attribute in whole schema, what will make that class pretty big - about 250 members. But it is my implementation detail, marked as internal, never exposed, so I feel it's not such a bad thing, especially that it would take three or four times more code when implemented separately.

Of course, that similarity in child elements and attributes is specific for HTML and there are different XML schemas that do not have such characteristics. In those cases, it'll probably be cleaner to implement each interface separately.

OK, by now we have 80 empty interfaces and one empty class implementing all of them. We need a way to create instances of given interface. In case of separate implementations, we'll probably go with newing it where needed. But here we can do it in generic and concise way, as we have all supported elements implemented in the single class - we just need a type cast to an interface. I have a static utility class for that - ElementFactory. To keep things simple, I'm using the element interface name to get element name added to XML tree by skipping the prefix and lowercasing.

The last infrastructure thing to note is already known NodeBuilder class, which is a wrapper for standard XML API, extended this time with few tweaks. Creating a node and running its Action<T> is now hidden inside AddNode method with element's interface in generic argument. This way I just need to call provided Action<T> on the instance fetched from ElementFactory.

XSD translation

Time to fill in the elements' interfaces and the ElementImpl implementation. I've decided to follow the XSD literally. I've translated each <xs:attributeGroup> into an interface. See the example:

  <xs:attributeGroup name="coreattrs">
<xs:attribute name="id" type="xs:ID"/>
<xs:attribute name="class" type="xs:NMTOKENS"/>
<xs:attribute name="style" type="StyleSheet"/>
<xs:attribute name="title" type="Text"/>
</xs:attributeGroup>
    public interface IHaveCoreAttrs
{
void Id(string id);
void Class(string @class);
void Style(string style);
void Title(string title);
}

The same with each <xs:group>, groupping the elements. Each element available corresponds to an Action<T>-typed method in loquacious interface:

  <xs:group name="fontstyle">
<xs:choice>
<xs:element ref="tt"/>
<xs:element ref="i"/>
<xs:element ref="b"/>
<xs:element ref="big"/>
<xs:element ref="small"/>
</xs:choice>
</xs:group>
    public interface IHaveFontStyleElements
{
void Tt(Action<ITt> action);
void I(Action<II> action);
void B(Action<IB> action);
void Big(Action<IBig> action);
void Small(Action<ISmall> action);
}

Again, the same with each <xs:complexType>. Note that as XSD elements reference and extend each other, we are following it with our interfaces:

  <xs:complexType name="Inline" mixed="true">
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:group ref="inline"/>
<xs:group ref="misc.inline"/>
</xs:choice>
</xs:complexType>
    public interface IInlineComplexType : IHaveInnerContent, IHaveInlineElements, IHaveMiscInlineElements {}

I've also translated each enumerated type (defined as <xs:restriction>) into C# enumeration.

And finally, we get to <xs:element>s. We're doing the same here. If the element extends already defined XSD complex type, we mimic it with inheriting from corresponding interface we've created previously; if the element contains a group of attributes, we inherit from corresponding interface again; when there are other attributes or elements referenced inside, we add it directly to our interface. See the example:

  <xs:element name="body">
<xs:complexType>
<xs:complexContent>
<xs:extension base="Block">
<xs:attributeGroup ref="attrs"/>
<xs:attribute name="onload" type="Script"/>
<xs:attribute name="onunload" type="Script"/>
</xs:extension>
</xs:complexContent>
</xs:complexType>
</xs:element>
    public interface IBody : IBlockComplexType, IHaveCommonAttrs
{
void OnLoad(string onLoad);
void OnUnload(string onUnload);
}

Each time we add methods to any of our interface, our ElementImpl class needs to grow. Every new method corresponds to either an attribute or a child element - in both cases the implementation is very simple:

        public void Body(Action<IBody> action)
{
_nb.AddNode(action);
}

public void Id(string id)
{
_nb.SetAttribute("id", id);
}

It just calls an appropriate NodeBuilder's method. In case of nodes, we rely on the type of Action<T> parameter - that's all we need. The starting point method - Html.For<T>(Action<T> action) - looks pretty much the same - we can start from any point in XHTML tree by specifying the element's interface which is needed.

Usage example

Let's take the first example from XHTML article in Wikipedia and build it using NOtherHtml.

var html = Html.For(x =>
{
x.Lang("en");
x.Head(head =>
{
head.Meta(meta =>
{
meta.HttpEquiv("Content-Type");
meta.Content("text/html; charset=UTF-8");
});
head.Title(t => t.Content("XHTML 1.0 Strict Example"));
head.Script(script =>
{
script.Type("text/javascript");
script.CData(@"function loadpdf() {
document.getElementById(""pdf-object"").src=""http://www.w3.org/TR/xhtml1/xhtml1.pdf"";
}");
});
});
x.Body(body =>
{
body.OnLoad("loadpdf()");
body.P(p =>
{
p.Content("This is an example of an");
p.Abbr(abbr =>
{
abbr.Title("Extensible HyperText Markup Language");
abbr.Content("XHTML");
});
p.Content("1.0 Strict document.");
p.Br();
p.Img(img =>
{
img.Id("validation-icon");
img.Src("http://www.w3.org/Icons/valid-xhtml10");
img.Alt("Valid XHTML 1.0 Strict");
});
p.Br();
p.Object(obj =>
{
obj.Id("pdf-object");
obj.Name("pdf-object");
obj.Type("application/pdf");
obj.Data("http://www.w3.org/TR/xhtml1/xhtml1.pdf");
obj.Width("100%");
obj.Height("500");
});
});
});
});

Weeknesses (or rather strengths)

My basic implementation was to show the pattern for loquacious interface based on XSD only, and it was not intended to follow all the XSD constraints, i.e. it doesn't enforce that required attributes are set, like alt for <img>. But that's relatively easy to achieve - we can always change Img(Action<IImg> action) method and add required parameter there, i.e. Img(string alt, Action<IImg> action).

Similarly, if one finds Em(s => s.Content("emphasized text")) to be cumbersome, it's easy to change the implementation to allow calling Em("emphasised text") - it can even be implemented as an extension method:

public static class HtmlExtensions
{
public static void Em(this IHavePhraseElements parent, string content)
{
parent.Em(x => x.Content(content));
}
}

Hope you can see the power beneath all that simplicity. Loquacious interface patterns just allow us to build our APIs that perfectly suits our needs.

Tuesday, February 28, 2012

Loquacious XML builder

Let's try to make use of loquacious interface patterns I've shown in the previous post to build something simple but useful - an API to construct an arbitrary XML document with simple, readable and elegant piece of C# code. If you find it helpful, fell free to use it - for convenience, I've put it on GitHub.

We'll start with an utility class wrapping the standard cumbersome XML API. Nothing really interesting here, just few methods to add attributes, nested elements or inner content to a given XmlNode object.

internal class NodeBuilder
{
private readonly XmlDocument _doc;
private readonly XmlNode _node;

public NodeBuilder(XmlDocument doc, XmlNode node)
{
_doc = doc;
_node = node;
}

public void SetAttribute(string name, string value)
{
var attribute = _doc.CreateAttribute(name);
attribute.Value = value;
_node.Attributes.Append(attribute);
}

public XmlNode AddNode(string name)
{
var newNode = _doc.CreateElement(name);
_node.AppendChild(newNode);
return newNode;
}

public void AddContent(string content)
{
_node.AppendChild(_doc.CreateTextNode(content));
}
}

Now we'll create an entry point for our loquacious XML API - it'll be a static method that creates an instance of XmlDocument, uses NodeBuilder to initialize the document with a root element, runs a loquacious Action<INode> for the root node and finally, returns the XmlDocument content as a string.

public static class Xml
{
public static string Node(string name, Action<INode> action)
{
using (var stringWriter = new StringWriter())
{
var doc = new XmlDocument();
var root = new NodeBuilder(doc, doc).AddNode(name);
action(new NodeImpl(doc, root));

doc.WriteTo(new XmlTextWriter(stringWriter));
return stringWriter.ToString();
}
}
}

What do we need in the INode interface, used within Action<T> parameter? As always with loquacious interfaces, it should resemble one level of our object structure - an XML node in this case. So we'll have two simple methods to add an attribute and an inner content and another Action<INode>-parametrized method to add a new node at the next level in the XML structure.

public interface INode
{
void Attr(string name, string value);
void Node(string name, Action<INode> action);
void Content(string content);
}

The implementation of INode interface is pretty straightforward, following the patterns I've described previously.

internal class NodeImpl : INode
{
private readonly XmlDocument _doc;
private readonly NodeBuilder _nb;

public NodeImpl(XmlDocument doc, XmlNode node, string name)
{
_doc = doc;
_nb = new NodeBuilder(doc, node, name);
}

public void Attr(string name, string value)
{
_nb.SetAttribute(name, value);
}

public void Node(string name, Action<INode> action)
{
action(new NodeImpl(_doc, _nb.AddNode(name)));
}

public void Content(string content)
{
_nb.AddContent(content);
}
}

And that's it! We can use this three-class implementation to create any XML we need. For example, here is the code that builds a simple NuGet package manifest:

var package = Xml.Node("package", x => x.Node("metadata", m =>
{
m.Attr("xmlns", "http://schemas.microsoft.com/packaging/2010/07/nuspec.xsd");
m.Node("id", id => id.Content("Foo.Bar"));
m.Node("version", version => version.Content("1.2.3"));
m.Node("authors", authors => authors.Content("NOtherDev"));
m.Node("description", desc => desc.Content("An example"));
m.Node("dependencies", deps =>
{
deps.Node("dependency", d =>
{
d.Attr("id", "First.Dependency");
d.Attr("version", "3.2.1");
});
deps.Node("dependency", d =>
{
d.Attr("id", "Another.Dependency");
d.Attr("version", "3.2.1");
});
});
}));

Of course this is the simple case - we could construct XML like this using StringBuilder pretty easily, too. But the flexibility this kind of API gives is very convenient for more complex scenarios. I'm going to show something more complicated next time.

Thursday, February 23, 2012

On loquacious interfaces, again

I've recently finished my review of NHibernate's mapping-by-code feature and the thing I'm most impressed with is its API design. Fabio Maulo, mapping-by-code creator, calls this a loquacious interface, as opposed to chained, fluent interface. I don't know if that name is well established or formalized yet - Google shows only NH-related hits. I don't know any other projects using solely this kind of API either. But I think this is going to change soon, as Fabio's approach seems to be more powerful and in a lot of cases more readable and "fluent" than chained interfaces.

What exactly I'm talking about?

I'm thinking of an API intended to build complex structures in code that resembles the structure itself. Mapping-by-code API (loosely) resembles NHibernate's HBM XML structure, so that when in XML we had an attribute, in loquacious interface we have a method call, and when we had a nested element, we have a nested lambda expression.

<!-- HBM XML fragment -->
<property name="Example" lazy="false">
<column name="ColumnName" />
</property>
// mapping-by-code fragment
Property(x => x.Example, m =>
{
m.Lazy(false); // attribute equivalent
m.Column(c => c.Name("ColumnName")); // nested element equivalent
});

The first and most important thing to note is that loquacious interfaces supports tree structures, contrary to fluent chains, which are linear in its nature. As Martin Fowler mentions, fluent chains are "designed to be readable and to flow". Loquacious interface flows less, giving an ability to define arbitrarily complex structures insted, without losing on readability.

The only loss I can see (apart from less code needed to implement the API) is that there's no ability to enforce how many times and in what order methods are called - in the chain we can control it with types returned from each chain element, in loquacious interface's lambdas we have no control over how the methods are called.

How is it build?

What delights me is that there's no rocket science in loquacious interfaces at all (opposed to fluent chains which are hard to be designed well - see Fowler's article or my thoughts on Fluent NHibernate's component mapping). As we've seen in mapping-by-code example above, we have two types of methods inside lambdas in loquacious API - taking either simple object or another lambda. Methods with simple object-typed parameter are to modify the current level of structure we're creating, methods with lambda-typed parameter start a new level.

Suppose we want to use loquacious API to create a simple object tree like this:

var building = new Building()
{
Address = "1 Example Street",
Floors = new[]
{
new Floor()
{
Rooms = new[]
{
new Room() { Area = 33.0 },
new Room() { Area = 44.0 }
}
},
new Floor()
{
Rooms = new[]
{
new Room() { Area = 20.0 },
new Room() { Area = 30.0 },
new Room() { Area = 40.0 },
}
},
},
Roof = new Roof() { Type = RoofType.GableRoof }
};

To start building our Building, we have to create its first level using a method having Action<T>-typed parameter (Action<T> is a generic delegate taking single T parameter with no return value). T generic type should allow setting up elements available at given level - Address, Floors and Roof in this case. Let's sketch the starting point method's signature and prepare the interface used within its parameter:

public Building Building(Action<IBuildingCreator> action) { }

public interface IBuildingCreator
{
void Address(string address);
void Floor(Action<IFloorCreator> action);
void Roof(Action<IRoofCreator> action);
}

Address represents a simple Building's property, so it just has a string parameter. Floor and Roof represents complex objects, so we have another Action<T> parameters there. Methods have no return values - no chaining, standalone calls only.

Let's now implement our starting point method:

public Building Building(Action<IBuildingCreator> action)
{
var creator = new BuildingCreator();
action(creator);
return creator.TheBuilding;
}

We're instantiating an IBuildingCreator implementation and passing it to the action provided by our API user as lambda expression. IBuildingCreator's implementation creates a Building instance and exposes it through TheBuilding property. Each IBuildingCreator's method called by the user is supposed to modify that instance. Let's see the implementation:

internal class BuildingCreator : IBuildingCreator
{
private readonly Building _building = new Building();

public void Address(string address)
{
_building.Address = address;
}

public void Floor(Action<IFloorCreator> action)
{
var creator = new FloorCreator();
action(creator);
_building.Floors.Add(creator.TheFloor);
}

public void Roof(Action<IRoofCreator> action)
{
var creator = new RoofCreator();
action(creator);
_building.Roof = creator.TheRoof;
}

public Building TheBuilding { get { return _building; } }
}

The Building instance is created on BuildingCreator instatiation and modified by its members. Address method just sets up Building's property. Floor method repeats already known pattern - it creates the FloorCreator and appends newly built Floor to Floors collection. Roof method uses the same pattern again to assign Building's Roof property.

Note that we don't distinguish whether we're adding the element to a collection (like Floors) or we're assigning a single value (like Roof) at API level - it should be known from the semantics. Also note that the BuildingCreator class is internal and TheBuilding property is not included in the IBuldingCreator interface, so it stays our private implementation detail and don't need to be a part of public API we're creating - and that's quite neat.

Here's how to use the API we've just designed:

var building = Building(b =>
{
b.Address("1 Example Street");
b.Floor(f =>
{
f.Room(r => r.Area(33.0));
f.Room(r => r.Area(44.0));
});
b.Floor(f =>
{
f.Room(r => r.Area(20.0));
f.Room(r => r.Area(30.0));
f.Room(r => r.Area(40.0));
});
b.Roof(r => r.Type(RoofType.GableRoof));
});

The source code for this example is available on GitHub.

By following that pattern we can build an arbitrarily complex structures - we're not limited by the API design and its implementation will stay very simple - no method will exceed 3 lines of code. We can add new properties and levels easily, without breaking the API. Moreover, we have strongly typed lambdas everywhere, so our API can expose only methods that are valid at given point (not so easy with complex fluent chains). What's more, if we have recurring object patterns in different parts of our structure, we can reuse the same IXyzCreator interfaces and its implementations without any cost at all (again, try to do it within fluent chains).

Well, I'm quite impressed how many advantages this simple idea brings for us. I'm going to stick to that topic for a while to show some usages and "real" implementations of loquacious interfaces. Hope you'll enjoy!

Tuesday, February 21, 2012

Json.NET deserialization and initialization in constructors

I've recently run into a quite interesting problem when using Json.NET library. It shows up with a static lookup collection being modified during the deserialization of some objects. Although the behavior I've encountered is documented, but for me it is breaking a principle of least astonishment a bit, so I've decided to share.

I have a class, TestClass, that I'm going to serialize and deserialize using Json.NET. It contains a simple collection of string values, that is initialized in the constructor to contain some predefined values - it's is default state. I have these values defined somewhere in the separate class in a collection marked as readonly, treated like a constant, not supposed to be modified.

Here are the tests (written in Machine.Specifications) that illustrates the issue. I'm setting TestClass state in the constructor, but I expect it to be overwritten during the deserialization, as my JSON string contains different data. In fact, deserialized values are appended to the existing collection, which occurs to be exactly the same collection as my "constants".

public static class Constants
{
public static readonly IList<string> NotSupposedToBeModified = new List<string>()
{
"the first",
"the last"
};
}

public class TestClass
{
public IEnumerable<string> TheCollection { get; set; }

public TestClass()
{
TheCollection = Constants.NotSupposedToBeModified;
}
}

public class DeserializingTest
{
Because of = () =>
result = JsonConvert.DeserializeObject<TestClass>(@"{""TheCollection"":[""other""]}");

It should_deserialize_the_collection_correctly = () =>
result.TheCollection.ShouldContainOnly("other");

It should_not_modify_the_constant_collection = () =>
Constants.NotSupposedToBeModified.ShouldContainOnly("the first", "the last");

static TestClass result;
}

What is the result? Both tests failed:

should deserialize the collection correctly : Failed
Should contain only: { "other" }
entire list: { "the first", "the last", "other" }

should not modify the constant collection : Failed
Should contain only: { "the first", "the last" }
entire list: { "the first", "the last", "other" }

There are three separate issues that showed up together and resulted that apparently surprising behavior:

The first one is about the constant collection defined in Constants class, being not really constant. The readonly keyword guarantees that one can not replace the collection instance, but that already-created collection instance itself still can be modified normally. It's pretty clear, but can be missed at the first sight.

The second one is even more obvious - the assignment in TestClass's constructor doesn't initialize the local collection with values from the Constants class - it just assigns the reference to exactly the same collection. So, as the assigned collection can be modified and we've just assigned it to our TestsClass instance, we already have the doors open to modify the "constant" collection by mistake.

And finally, what the Json.NET deserializer is doing here? The documentation states: "By default Json.NET will attempt to set JSON values onto existing objects and add JSON values to existing collections during deserialization.". It means that when the collection instance for TheCollection property was already created by the constructor (well, actually not created but "borrowed" from Constants class), Json.NET doesn't create a new one and just appends deserialized values to the existing collection, modifying our NotSupposedToBeModified collection.

Well, the first two pitfalls are pretty easy, but I wouldn't expect the third one. Fortunately, Json.NET provides an easy way to customize its behavior in this matter using ObjectCreationHandling option. One simple addition in DeserializeObject method and we have two green tests (even if the first two issues are still there):

result = JsonConvert.DeserializeObject<TestClass>(
@"{""TheCollection"":[""other""]}",
new JsonSerializerSettings() { ObjectCreationHandling = ObjectCreationHandling.Replace });

Friday, February 17, 2012

Mapping-by-Code & Fluent NHibernate issues summary

In my mapping-by-code posts series I've just completed, I reviewed the capabilities of both mapping-by-code and Fluent NHibernate in comparison to plain old XML mappings. There are some more or less serious bugs on both sides, as well as both solutions don't offer everything XML does. In each case, when I found the issue worth mentioning, I was looking if it was already reported and reported it myself if not. Here is the quick summary:

Mapping-by-Code

Fluent NHibernate


As you can see, the number of issues I've encountered is very similiar for both mapping-by-code and Fluent NHibernate. For mapping-by-code, majority of them were already reported and actions were taken. Actually, 5 of them are already resolved and wait for NH 3.3 release. I've reported 3 new issues (one of which was fixed in few days) and extended another one.

For Fluent NHibernate, I've reported 8 issues out of 10 encountered. Sadly, by now, none of them were even commented. It looks like there's no active development on FNH. I tend to agree that leaving issues in no man's land with no status at all is a sign of a neglected community. I'd prefer to have these issues closed with "won't do" status than ignored, for sure.

Wednesday, February 15, 2012

NHibernate's mapping-by-code - the summary

Six weeks ago, when I started my experiments with NHibernate's 3.2 new mapping feature - mapping-by-code, I was a loyal Fluent NHibernate user and a fan of method chains in APIs. My first impression about mapping-by-code was that it seems to be a good direction, but it's still immature and - what's important - not documented at all. I decided to have a deeper look and it turned into almost twenty parts series exploring all the possible mappings - probably the only complete guide to mapping-by-code on the web so far. Time to sum the series up.

Let's start with what mapping-by-code is. It is an XML-less mapping solution being an integral part of NHibernate since 3.2, based on ConfORM library. Its API tries to conform to XML naming and structure. There's a strong convention in how the mapping methods are built. Its names are almost always equal to XML elements names. The first parameter points to the mapped property, second is for its options corresponding XML attributes (and XML <key> element, if applicable) and the rest of parameters, if any, corresponds to nested XML elements. It's very convenient for those familiar with XML schema or for documentation readers.

Mapping-by-code also came with very powerful mapping by convention tool - ConventionModelMapper. It is highly flexible and customizable, but customizing it may not even be needed, as by default it is able to figure out mappings even for components or maps. The only thing it can't map automatically are bidirectional relationships - but it was pretty easy to fix this using conventions (I've updated my conventions since first published - it now supports all kinds of collections, inheritance and more - feel free to use it).

Here is the full table of contents of my mapping-by-code series.

  1. First impressions
  2. Naming convention resembling Fluent
  3. Property
  4. Component
  5. ManyToOne
  6. inheritance
  7. dynamic component
  8. Set and Bag
  9. OneToMany and other collection-based relation types
  10. concurrency
  11. OneToOne
  12. Join
  13. Any
  14. List, Array, IdBag
  15. Map
  16. Id, NaturalId
  17. composite identifiers
  18. entity-level mappings

And what about Fluent NHibernate? Hiding the XML was a great idea, but simplifying the mappings went too far, in my opinion. I've already mentioned the mess caused by concept name changes made in Fluent NHibernate (1) (2) - I wouldn't repeat it again. Moreover, XML mapping is a tree structure and it just doesn't fit into single method chains. Fluent NHibernate's API bypasses this limitations by prefixing method names (like KeyColumn) or by falling back to the interface that uses Action<T> (i.e. in Join or Component mapping), quite similiar to mapping-by-code API. Method chaining also makes it hard to reuse the same concepts in different contexts. It's lot easier in mapping-by-code way - i.e. Column mapping is the same in every mapped feature and it is handled by exactly the same code.

Don't get me wrong. I think FNH was a good and useful project. But I've used it as the only existing alternative to cumbersome and verbose XML mapping. And now, when we have an alternative that is integrated into NHibernate (no external dependency and versioning issues), more efficient (no XML serialization) and with better API (no ambiguity, NH naming kept), the purpose of FNH's existence is highly reduced.