Ayende has a good post on how you would manage exceptions in your production environment. I've talked about how error management in a messaging world is a bit different in a previous post but I want to clarify how the things Ayende mentions in his post relates to messaging in general and NServiceBus in general. (the same would apply to MassTransit as well)
So head over and read the post first:)
Ok, lets start with this statement:
The interesting thing about that is that in both cases, production errors were handled in exactly the same way. They were ignored until a user called and complained about that
This could be true for NServiceBus as well but the fact that all errors ends up in a central error queue makes it a bit harder to ignore. I don't have an answer for this but I guess it's harder to explicitly purge an error queue as opposed to ignore an email or wading through gigs of production logfiles. Most of my customers have process that feeds of the error queue to do alerts, automatic retries,etc, more on this later.
The major problem is that errors happen all the time. In the vast majority of cases, you don’t really care, and it will fix itself automatically. For example, a error such as Transaction Deadlock Exception might happen a few times a day
This type of error are transient, meaning that they can be resolved with a retry, heck SQL Server is telling that to our face with the classic:
Error 1205 : Transaction (Process ID) was deadlocked on resources with another process and has been chosen as the deadlock victim. Rerun the transaction
NServiceBus will handle this automatically for you by doing a configurable number of retries. This ensures that transient errors like deadlocks usually don't end up in the error queue.
That is a major difference between Errors and Alerts. An error is just an exception or a problem that happened in production. Usually, those aren’t really important. It will sort itself out... you don’t care
When it comes down to deciding which errors that should lead to alerts you're pretty much on your own. NServiceBus will make sure that every message that fails processing will end up in a error queue somewhere. Then it's up to you to decide if you're going to do a retry in 1 hour , create a bug report OR wake up the admin in the middle of the night to get the database running again. Technically you would just create a regular NServiceBus endpoint that feeds off the error queue and make decisions based on your specific business requirement.
I know that there are a open source effort to create a generic error management solution for NServiceBus that would cover the basic use cases so expect some more info soon.