>

Recently I was working on a piece of code that would batch items from a stage area into our application. Everything worked ok but when we tested the solution with lots of data the performance was not what we had hoped for. After running a profile using dotTrace a got a bit puzzled to see that reason for our poor performance was the dirty tracking performed by the NHibernate session. As it turned out it was my own stupid way of handling the session that caused the problem.

The problem

Let me try to explain the problem and how to solve it. The original code looked something like this:

using(var session = sessionSource.GetSession())
{
IList<WorkItem> itemsToProcess = GetAvailableWork(batchSize);

foreach(var item in itemsToProcess)
{
using(var tx = session.BeginTransaction())
{
//do work
tx.Commit();

}
}
}

The problem with this approach is that every object that is saved or loaded through a session is cached in the sessions first level cache. This is usually good (saves you some trips to the database), but in my case using a single session for the entire operation had the effect that a large number objects where associated with my session and every time I committed my transaction NHibernate had to go through all objects and determine if something had changed in order to save the changes in the database. So while I kept loading more and more objects into my session the number of "dirty checks" kept increasing and thereby killing performance.

Ok, I've added this to my how to Shoot My Self In The Foot List so hopefully it wont happen again. Now lets see how we can solve this problem.

Solution 1 - The "Clear session hack"

The quick way to solve this is to clear the session after getting the list of objects to process and also after every commit.

using(var session = sessionSource.GetSession())
{
IList<WorkItem> itemsToProcess = GetAvailableWork(batchSize);

//clear session to avoid unnecessary dirty tracking
session.Clear();

foreach(var item in itemsToProcess)
{
using(var tx = session.BeginTransaction())
{
//do work
tx.Commit();

}
//clear session to avoid unnecessary dirty tracking
session.Clear();

}
}

This solution is quick and dirty (after all we are talking about dirty tracking :) ) but after reading Clean Code I consider the need for comments in my code a sign of bad naming or just a plain excuse for a hack. There for I decided to refactor it a bit.

Solution 2 - The "do it the way it should be done"

The correct way to do this is to use a new session for each "unit of work" and there by eliminating the need for cryptic calls to session.Clear(). The code below illustrates this solution:

using(var session = sessionSource.GetSession())
{
IList<WorkItem> itemsToProcess = GetAvailableWork(batchSize);
}


foreach(var item in itemsToProcess)
{
using(var session = sessionSource.GetSession())
using(var tx = session.BeginTransaction())
{
//do work
tx.Commit();
}
}

EDIT:

As Marko Lahma points out in the comments an even better way to solve this is to remove the cache completely by using the stateless session instead!
http://nhforge.org/blogs/nhibernate/archive/2008/10/30/bulk-data-operations-with-nhibernate-s-stateless-sessions.aspx
Thanks Marko for the heads up!

Lessons learned

Make sure you understand how the NHibernate session caches objects and perform dirty tracking in order to get maximum performance!