NSBCon – Under the Hood of Particular Software

In case you’ve missed it the first ever NServiceBus conference is happening 26th -27th June in London. There is a whole host of excellent speakers so make sure to not miss it!

https://skillsmatter.com/conferences/6198-nsbcon

I’m going to deliver a session on how we do things inside engineering here at Particular. This is a talk that I’ve been really wanting to do for a long term because I’m leading a truly awesome team and we’ve really been pushing the boundaries for what I thought was possible in terms of automating of “all the things” :)

The abstract for the talk is:

 

Continuously delivering a top quality platform on which customers build and run their businesses is not a challenge to be taken lightly. Join Andreas Öhlund, Director of Engineering at Particular Software, for a journey through custom tooling, testing the un-testable and massive cloud service bills in the name of backwards compatibility, performance and reliability.

An adventurous journey indeed but an absolute must to ensure that the software you depend on delivers what’s stated on the box.

The talk will give an in-depth look on what we do to continuously deliver our service platform with the top quality that is expected from us,

See you there!

Posted in NServiceBus | Tagged | Leave a comment

How we version our software – in Particular

In my last post I talked about how we aim to have a release build in Visual Studio produce production ready binaries. Obviously for something to be production ready it must have a version in order for users know what they’re actually using. Just to clarify, when I talk about versioning in this post I refer to the technical version, most likely following SemVer, that you give the artifacts you release. I’m not taking about the marketing version that you might brand your things with.

The strategies for technical versioning I’ve seen can be split into 2 camps:

  1. Have the build server determine the version
  2. Commit version information into the repository it self

Since we want the version to be correct even when you go back in time and build older branches and tags #1 is ruled out. Other drawbacks with this strategy is that you need to have the build server up and running to produce a release meaningful version. We definitely want a local build to generate a good version so that puts us firmly in camp #2.

At this time 2 different forces where in play, firstly we didn’t like all those “Bumping version to X” commits cluttering our repos. And a process like that cause a lot of emails from users saying “the latest release have the wrong version, did you forget to change it?” appear in your inbox. Secondly since we’re moving to a multi repository model we needed to ensure that we use a consistent branching model across our repositories. Why is that important? Trust me spending the first few brain cycles on figuring out where to start coding every time you switch repo is definitely a drag on productivity.

For us the branching model we’ve decided to go with is GitFlow and as we started to discuss our versioning scheme we realized that if we follow the GitFlow model and also adhere to SemVer we can deduct the version based on the branch names and merge commits in Git. If we could pull that off we could kill 2 birds with one stone, version information is now embedded in the repo without dreaded “bumping to X” commits and we could also enforce our branching model by failing the build if we detected illegal commits and branch names. As an example a commit straight to “master” is not valid using GitFlow since the head will always point to either a merge commit for  a hotfix or release branch.

GitFlowVersion was born

I’m not going into full detail on the algorithm as its already documented here but in short the develop branch is always one minor version above the master. Master is always the SemVer version of the latest merge commit (eg: hotfix-4.1.1 or release-4.3.0) and so on.

We use GFV across all our repos and so far it seems to hold up.  The major hurdles so far has been all the quirks in NuGet that we’ve had to work around since the version generated by GitFlowVersion also is used to version our NuGet packages. One of the main snags was that NuGet has no concept of builds  and since we use NuGet-packages to manage dependencies across repositories we need to make sure that our develop branch always outputs packages that sorts highest at all time. This forced us to give our develop builds the “Unstable” pr-erelease suffix. This makes sure that they stay on top since Unstable > Beta, RC, etc. I’ll go into more details on how we do our cross repo dependencies in an upcoming post so stay tuned!

Here is the NServiceBus repository being versioned by GFV:  (the number you see is the PR number that’s assigned by GitHub)

 

 

TL;DR;

Use GitFlowVersion to automatically version your code without mucking around with either the buildserver or VERSION.txt files…

https://github.com/Particular/GitFlowVersion

Posted in Particularities | Comments closed

Build scripts – in Particular

In my previous post I mentioned that we’re moving towards a more fine grained structure for our repositories. This direction has had a quite interesting effects on our build scripts…

We don’t use build scripts any more

Yes you read that correctly. Since we now have much less moving pieces in our repositories the complexity of our build has decrease to the point that we don’t see any value in running scripts to perform our build. Actually that isn’t entirely true since we use Visual Studio to do our builds and that’s technically a build script.

So let me rephrase: We don’t use a separate scripting language apart from msbuild (csprojs + sln’s) to build our source code. This means that we build locally using Visual Studio (or msbuild on the commandline) and on the build server (TeamCity) we use the Visual Studio solution build runner. This means that we can skip the dreaded readme.txt that tells you all the steps you need to do before you can fire up Visual Studio and get some coding done.

But MsBuild sucks?

Yes it surely sucks, being a recovering NAnt user I can sincerely tell you that all that xml is bad for your eyes. We moved from NAnt to psake and that was a great relief to manage our very complex build at that time without paying the xml-tax. But now it seems we’ve come full circle, yes we need to drop back to hacking xml straight into the csprojs now and then but we see that as healthy friction that tells us that we need to simplify in order to keep our goal that a “release build in visual studio should create production ready binaries”. Yes we’re not there yet hence the “release build” part. We needed to make this trade off since we still do ilmerge which can be quite time consuming so we decided to only do it for release builds. Again this friction is good since that is pushing us to find ways to avoid ilmerging  dependencies since that tends to come back to haunt you in the end.  If we need more advanced things like package restore using ripple and automatic versioning using GitFlowVersion we’ve created our own msbuild tasks to avoid to much xml editing.

Final words

I’m not saying that you should avoid build scripts but our journey towards small repos has allowed us to get away with just the .sln’s and .csprojs’s to handle out build and it has worked out very nicely for us.

Posted in Particularities | Comments closed

The pain and suffering of large source code repositories

Back in the days when we where hacking away at NServiceBus v2 we kept all code in a single repository, all was good, happy days…

Things has changed since then, we started to feel pain, one day the happiness was gone…

Perhaps we’d outgrown the single repo model?

The first sign that we needed something else was the rename of the company to Particular. The move was driven by us wanting provide other NServiceBus related products and to create what we hope to be an awesome service platform. All those new products, ServiceMatrix, ServiceInsight, ServicePulse and ServiceControl obviously needed to go onto separate GitHub repositories. We now had to manage more than one repo and that process opened our eyes to the possibility to split the NServiceBus repo even further. I’m not saying that this applies to all of you out there but as you grow do pay attention and as the pain starts to increase it might be a sign that you need to split things up.

NServiceBus v3 added support for Windows Azure as a transport, RavenDB as persistence and a host of new IoC containers. V4 added transports for ActiveMQ, RabbitMQ, WebSphereMQ, SqlServer. We’ve hit the tipping point, we couldn’t stand the pain anymore…

Sources of pain – One release to rule them all

Our massive NServiceBus repo effectively locked us to a “release everything every time” model. While this was fine when we only supported MSMQ as the underlying transport it got ridiculous with the addition of alternate transports. Imagine having to put out a patch release for RabbitMQ just because there was a bug in MSMQ. Yes you can do partial releases from one big repo but since everything from TeamCity to GitHub is really built with the repository == product == one release cycle mindset so the friction of partial releases would cause makes this workaround unsustainable.

Since we need to release our transports, storages, containers in close relation to the release cycles of those said dependencies this essentially meant that we had to release a lot of components even though nothing had changed.

User: “So what’s new in 4.0.3?”

Us: “We updated to RavenDB 2.2″

User: “But I’m using NHibernate to store my stuff in SqlServer?”

#FAIL! 

Branching models

This pain is really an extension to the release cycle issues mentioned above but your branching model will most likely fall apart when you’re trying to juggle multiple hotfixes/release for different parts of your source tree. We’re using GitFlow as our branching model but I’m convinced the same would apply to pretty much all the different model out there. Imaging the confusion when you create a release-4.1.0 branch to start stabilising X and then the week after having to create a release-1.1.0 to stabilising Y. Since developers always seems mystified by the use of branches, I blame you TFS, Subversion, ClearCase!, this added dimension is definitely going to throw you right into “lets stick with the master branch” mode.

Test complexity and speed

We have an extensive suite of automated tests that runs for each transport, storage, container implementation we have. In order to support everything running of the same repo we needed to put in a lot of code to handle those “permutations”. The other obvious drawback is since those tests are fairly slow and we need to run them all for every change it didn’t scale at all. Since we’ve starting to split things up test complexity has gone down and tests runs much faster since we only run the ones that actually are relevant for the code that has changed.

Versioning

Since we want different versions for each “component”, we follow semver, trying to achieve that on one big repo is doomed to fail. You can’t put the version info into the build server since it it will build the entire repo for each change. Putting it into the repo it self is tricky and as well since it requires a lot of tinkering and causes the build complexity to go up seriously. Putting version info in the repo it self is IMO not a good idea since it causes tons of unneeded churn with all those “Bumping version to X.Y.Z” commits. I’ll leave it to a follow up post to show you our take on solving this in a full automated way. (Spoiler – GitFlowVersion)

These where the main pain points for us but the list goes on with Issue management, Pull request management, Contributions, Permissions, Repo size, Build complexity etc…

Is this multi repo thing all unicorns and cute kittens?

NO!

If this was a few years back I’d say we couldn’t have pulled it off since support for this type of structure is nowhere to be found in the tools at that time. Even today we’ve had to build a lot of custom tools to help us get where we want to be. Thing that could be done manually before transforms to productivity killers and I’ll tell you this:

If you’re not prepared to breed an “automate everything” culture this thing is not for you. When going multi repo any manual task that might have been fine in the past will grind you to a halt. At Particular “automate everything” is one of our core values since we’re not enough people to spend our precious time on “automatable” things like creating release notes, bumping versions, updating dependencies etc.

We’ve put in a lot of effort to move in this direction and our collaboration with the FubuMVC project has been invaluable since they are quite far down the fine grained repo road and has provided us with great insights and tools. Thanks Josh and Jeremy!

I definitely hope to get my blogging act together so this is the first post in a series where I talk about all the tools we use/created processes we follow/had to adjust to embark this journey. So far we’ve got the transports separated and setting our sights on the storages and containers next. We’re even looking at cleaning up our core even further by moving optional components like or Gateway and Distributor to separate repositories as well.

TL;DR; Fine grained repositories requires a great deal of effort but is definitely worth it! #yourmileagemayvary

Posted in Particularities | Comments closed

Accessing audit and error messages in NServiceBus v4.0

NServiceBus has always supported centralized audit and error handling by allowing you to specify centralized queues where successfully and failed messages ends up. NServiceBus will guarantee that each message you send/publish will end up in one of those queues, hopefully the audit queue:). Earlier versions of NServiceBus had some rudimentary support for managing errors but for audit messages it was up to you as a user to make something useful with all that data. This has now changed with the introduction of ServiceInsight.

ServiceInsight gives you a graphical view of your messages flows, insight into why a given message has failed and also a way to issue retries for those messages. Since queues are not really that great for longtime storage of data we needed a way to process all those message in order to make it easier to query and manipulate them from the UI. We also decided to not limit access to just our own tools but instead make all data and operations on it available to you via a REST’ish api.

The api is still early days but we hope to keep on expanding it to help you build management extensions for NServiceBus for your tool of choice.

Official documentation is coming soon but I don’t expect much to change from what’s mentioned below.

Installing the backend

The api backend is installed automatically when you run the NServiceBus or the ServiceInsight installers. The backend is just a regular NServiceBus host that gets installed as a windows service. Once that is done it will continuously import messages from your audit and error queues and making the data available over http. By default the api will respond on http://localhost:33333/api but the installers will allow you to tweak the uri to your liking. If you forgot where it got installed you can always consult the following regkey:

HKEY_LOCAL_MACHINE\SOFTWARE\ParticularSoftware\ServiceBus\Management

Once installed the endpoint will import and index all error and audit messages.

Where did my errors go?

The management endpoint will import all errors into our internal storage and leave a copy in a queue called {name or your error q}.log so in your case it will most likely be in error.log.

This means that you can still run tool like QueueExplorer, ReturnToSourceQueue.exe etc to manage your errors in the same way as before if you don’t have access to ServiceInsight.

Supported API calls

The API is heavily inspired by the GitHub api guidelines for urls, paging, sorting etc

At the time of writing the following calls are supported:

GET /audit   – lists all messages successfully processed

GET /endpoints/:name/audit  – lists all messages successfully processed for the given endpoint

GET /errors   – lists all failed messages
GET /endpoints/:name/errors  - lists all failed messages for the given endpoint

GET /messages/:id – Gets the given message

GET /messages/search/:criteria – Does a free text search with the given lucene query

GET /endpoints – Lists all the known endpoints

GET /endpoints/:name/messages – Lists all messages (including failed ones) for the given endpoint

GET /conversations/:id- Gets all message for the given conversation id. (I’ll talk more about conversations in a follow up post)

In terms of updates we only support one at the moment:

POST /errors/:id/retry  - Issues as retry for the given message. Returns Http Accepted  (202) if successful since the actual retry is performed async.

 

Security

The is no explicit security in place at the moment and the service is expected to run on a secured server that only authorized personnel has access to.

Lets see it in action

I’ve created a few ScriptCs scripts to show the usage of some of the above calls.

The full repo can be found here:
Latest unstable Management API:

Please take it for a spin and tell me what you think!

You can use the Video store sample that comes with NServiceBus to generate some test data if you don’t have a v4 system up and running!

 

What is missing?

Which tools would you build NServiceBus plugins for?

Comments are most welcome!

 

Limitations:

* The first drop only supports the MSMQ transport but support for the others is our top priority so expect support for all other transports in the very near future

 

Posted in NServiceBus | Comments closed

A particular blog post

You might have wondered why this blog has been so silent, yes I know workload is a poor excuse for not writing blog posts, but the reason is that I’ve been heads down trying to get NServiceBus v4.0 out of the door. We released v3 March 08 2012 so its been a little over a year since our last major release but finally v4.0 is  now available as release candidate with a go live license. You can read the release notes here.

Another major change it that we renamed the company to Particular Software since we felt that we wanted to broaden our offerings include a range of other products related to NServiceBus. But hey don’t worry we’re the same guys writing software particularly for you!

I’ll be ramping up my blogging both here and on our new company blog going forwards!

Posted in NServiceBus | Comments closed

NServiceBus v4 – Beta1 is out

I just want to let you know that we released the first beta of the upcoming v4 of NServiceBus last friday. The main focus for v4 has been to make NServiceBus run on a wider range of queuing infrastructures while still giving you the same developer experience.

Out of the box v4 will support ActiveMq, RabbitMq, WebSphereMq and SqlServer, yes using your old and trusty database as a queuing platform. While adding support for all those new transports we had to do quite a lot of refactoring deep in the bowels of NServiceBus and the end result is a much cleaner codebase and its my hope that adding new transports should be a breeze going forwards. There is already a Redis transport in the works by our amazing community.

We have of course made a lot of other improvements as well so please take a look in the release notes for the full scope.

From now on we’ll be focusing on stabilizing the release and ramp up on documentation so hopefully you’ll see an increase in v4 related blog post here as well.

You can grab the new bits either as a msi download over at our site or via nuget.

If all goes well we hope to get the final version out of the door within the next 3-4 weeks.

Go ahead and take it for a spin!

 

 

 

Posted in NServiceBus | Comments closed

Pluralsight interview

I was interviewed by the good folks over at Pluralsight a few weeks ago and the result is now online. If you’re interested in a few war stories from my dark past and also a glimpse into what’s coming up in NServiceBus vNext you can listen to the full interview here:

http://blog.pluralsight.com/2012/12/04/meet-the-author-andreas-ohlund-on-introduction-to-nservicebus/

 

 

Posted in NServiceBus | Comments closed

Monitoring your dead letter queues

NServiceBus relies heavily on the underlying queuing system to make sure that messages gets delivered in a robust and timely fashion. In order to spot miss configurations it’s very important to monitor the dead letter queues (DLQ’s). You can think of the dead letter queue as a dumping ground for messages that can’t be delivered by the queuing system. This post focus on MSMQ but the general ideas apply to the other queuing systems as well.

Can all messages end up in the DLQ?

If your using NServiceBus this would be all messages since we make sure to set the required flags that tells MSMQ to enable the DLQ (negative source journaling in msmq lingo). This means there is no way you can loose valuable business data even if the queuing system is miss configured. More info on the dead letter queues in MSMQ can be found here.

When do they end up there?

Messages gets moved to  the DLQ when the queuing system has tried to deliver a message for a while but decides to give up. This time is usually configurable and in MSMQ this happens when either time-to-reach-queue(TTRQ)  or time-to-be-received(TTBR) for the message has expired.

The TTRQ is usually 4 days by default but can be adjusted per machine. TTBR is also 4 days by default but can be controlled on a message per message basis by adding the [TimeToBeReceived] attribute to your NServiceBus message definitions.

So how do I monitor this?

While the error queue in NServiceBus is usually system wide the DLQ’s are machine specific. This means that you have to monitor all your machines to detect messages ending up in any of the DLQ’s. There are many ways to do this, write a powershell script that periodically looks at the DLQ(s) , MSMQ has 2 different ones,  and sound the alarm. To do this just use the System.Messaging.MessageQueue class in your script. The address is:

DIRECT=OS:{your machine}\SYSTEM$;DEADLETTER 
DIRECT=OS:{your machine}\SYSTEM$;DEADXACT

Another option is to use your favorite monitoring tool and watch the “Msmq Queue -> Computer Queues ->Messages in Queue ” performance counter.

The image below shows this on my machine.

Note that the “Messages in queue” counter gives you the number of messages in both the DLQ’s.

TL;DR;

Make sure to monitor your DLQ’s to detect message queuing issues!

Hope this helps!

Posted in Messaging, NServiceBus, Ops | Comments closed

How to debug RavenDB through fiddler using NServiceBus

This is mostly a note to self since I always seems to forget how to do it:)

To setup a NServiceBus endpoint to make all calls to RavenDB through fiddler you need to do the following:

Configure the proxy for your endpoint by adding the following to your app.config

1 2 3 4 5
<system.net>
<defaultProxy>
<proxy usesystemdefault="False" bypassonlocal="True" proxyaddress="http://127.0.0.1:8888"/>
</defaultProxy>
</system.net>
view raw proxy.xml hosted with ❤ by GitHub

With the proxy setup we just need to change the Raven connection string to go through fiddler by adding:

1 2 3
<connectionStrings>
<add name="NServiceBus.Persistence" connectionString="url=http://localhost.fiddler:8080"/>
</connectionStrings>

That should be it, happy debugging!

Posted in NServiceBus | Comments closed