NServiceBus sagas and concurrency

The question on how the NServiceBus sagas handle concurreny came of on stackoverflow and my answer got long winded and turned into a blog post.

If your endpoints is running with more than 1 worker thread there is the possibility that multiple messages will hit the same saga instance at the exact same time. To give you ACID semantics in this situation NServiceBus will use the underlying storage to produce consistent behavior by only allowing one of the threads to commit. Most of this is handled automatically for you by NServiceBus but there are a few things you need to be aware of.

We can divide up concurrent access to saga instances into 2 scenarios. The first is when there is no existing saga instance and multiple threads will try to create a new instance of what should be the same saga instance. The other is where a saga instance already exists in storage and multiple threads try to update that same instance . Lets look at both scenarios in detail and what options you have.

Concurrent access to non existing saga instances

Sagas are started by the message types you handle as IAmStartedByMessages of T and if more than one of those are processed concurrently and is mapped to the same saga instance there is a risk that more than one thread will try to create a new saga instance. In this case we can only allow one thread to commit. The others will rollback and the built-in retries in NServiceBus will kick in and on the next retry the saga instance will be found and the race condition is solved there now will be a update on that saga instance instead. This can of course also result in concurrency problems but they are solved as mentioned in the next section. NServiceBus solves this be requiring you to create a unique constraint in your database for the property that you’re correlating on. In NServiceBus 2.X you had to create this constraint your self but in 3.X we have the [Unique] attribute. If you put that attribute on one of your saga data properties NServiceBus will create the constraint for you, this works for both the NHibernate and the RavenDB saga persister. With that constraint in place only one thread will be allowed to create a new saga instance.

In the future we’ll use this constraint to do even smarter things like auto mapping of your messages.

Concurrent access to existing saga instances

For this to work in a predictable way we rely on the optimistic concurrency support by the underlying database. This means that if more than one thread tries to update the same saga instance the database will detect this and only allow one of them to commit. If this happens the retries will occur and the race condition solved.

If you’re using the RavenDB saga persister you don’t have to do anything since we turn on optimistic concurrency automatically for you. When running using the NHibernate saga persister we require you to add a “Version” property to you saga data so that NHibernate can work its magic. In NServiceBus 4.0 will make this even easier for you by enabling the optimistic-all option if no Version property is found.

Another option is to use a transaction isolation level of serializable  but will cause excessive locking so the performance degradation would be considerable. Note that “Serializable” is default isolation level for TransactionScopes. In NServiceBus 4.0 we’ll lower it to “ReadCommitted” for you since we think that is a more sensible default

Hope this helps!

This entry was posted in NServiceBus, Sagas. Bookmark the permalink. Both comments and trackbacks are currently closed.
  • alexandergross

    Does that mean saga hosts must run with maxRetries >= 2?

    • http://andreasohlund.net/ Andreas Öhlund

      Ideally yes, but if if you have SLR enabled that will take care of it for you. If not the message will go to the error q where you can issue manual retries as well

  • Wally

    What happens if the 2 messages are processed in concurrency (workthread >= 1) AND the second message completes handler execution before the first one ? Is this a true scenario ? If true, which is the better implementation ? A lock on the saga row ? Is there a native support on NServiceBus about this case ?

    • http://andreasohlund.net/ Andreas Öhlund

      It’s definitely a valid scenario.

      In that case the second one will succeed to commit and the first one will get a optimistic concurrency exception. The only way to have the first one win is to lock the row when reading and that is exactly what “Serializable” is doing.

      I.e. serializable will allow threads to complete “in the order they read the saga data”

      • Amzath

        Does you answer applied to more than one worker node too? I mean if more than one worker node updating the same saga instance, optimistic concurrency applies to this scenario?

        • http://andreasohlund.net/ Andreas Öhlund

          Yes that applies there to if all workers share the same DB for saga storage

  • Joe

    Thanks for the post. I am struggling with situations where I have multiple thousands of responses of various types all correlating to the same saga instance. I see no easy solution except to run one thread. As you point out, if I run on more than one I will get concurrency errors. To compensate, I would have to really guess as to how to tune the retry count, what SQL timeout value, and the SLR it would take to work them all through. It does impact performance but I don’t see any alternatives. I also assume that after each handler, the Saga round trips to persistence if I update saga data.

  • Amzath

    What I understood is when concurrency error occur, message will be put back into the retry queue. In that case, how many threads NServiceBus uses to process retry messages in the queue. Can we configure it in application configuration file?

    • http://andreasohlund.net/ Andreas Öhlund

      It goes back to the main input queue first. The retries q is only for SLR. Number of threads is always == number of the threads configured for the endpoint