Recently whilst building a distributed application with a customer they asked the very valid question…
Can I roll up the health of a group of objects basing the health on the number of objects that are unhealthy?
They wanted to have a group of objects and have the group health be a warning state when less than half the group members were not healthy and only change to critical when more than half were in a critical state. Sounds reasonable and with fault tolerant systems, raising a critical health state when the application is still available, but fault tolerance is reduced, does seem a bit of an overreaction.
We have 3 dependency monitor roll up algorithms to work with, none of them allow the health state that is rolled up to be changed.
- Best of – Fairly obvious how this one works, as long as one of the member object was healthy the monitor would be healthy
- Worst of – Again pretty straight forward, the monitor would match the worst state of any of the members
- Percentage – The monitor will match a percentage of the member objects, almost what we want
Here’s what they look like in Health Explorer.
As you can see none of the dependency monitor roll up algorithms meet our requirement of rolling up a warning state if a percentage of the member monitors are unhealthy. We can work around this by combining two of the algorithms and creating a custom recovery task. as described by Roman here. If we create:
- A dependency monitor with the worst of algorithm
- A Recovery that targets the worst of monitor and changes critical states to warning states
- Another dependency monitor with the percentage algorithm – we need this to roll up a critical state once more than the defined percentage are in a critical state.
We end of with something like this in Health Explorer. The single critical monitor is rolled up as warning state.
Once another monitor is critical it is rolled up as a critical state
Ordinarily, I’d point the customer at the great work Roman has already done and we’d create the pairs of monitors and recoveries by hand in our XML editor/authoring tool of choice and everyone would be happy. However in this case the customer needed to create circa 100 pairs of dependency monitors and recoveries. That’s a lot of XML to build by hand! Luckily the Visual Studio Authoring Extensions have a handy timesaving feature called Snippet Templates. Snippet Templates are a lot like the mail merge feature in Word. Once you’ve created you boilerplate MP fragment all you need to do is feed it some data and it will build your XML.
I’m not going to cover the in depth authoring process with VSAE, if you want more information check out the Authoring Hub Wiki and Graham’s blog series.
I created a Visual studio solution that contained:
- A MP Fragment that housed our recovery task write action. We only need one write action, so it needed to exist outside of our Snippet Template
- Our Snippet Template, it contains the boilerplate xml for the dependency monitors and recovery tasks, with parts of the XML replaced with variables that are populated from Snippet Data
- A Snippet Data item, listing the monitors and relationships we wanted to build our monitors and recoveries for.
Snippet Template in Visual Studio
Snippet Data in Visual Studio
Adding an additional pair of monitors and a recovery is simply a case of adding a line to the Snippet Data or as we did creating a CSV and using the Import from CSV file function. Much easier and less error prone than creating it all by hand!
I’ve uploaded the example solution here. It also includes a group class and discovery to provide some examples of the values you need to enter into the Snippet Data.
All feedback and comments welcome.