With a little time between jobs and with having to dodge the rain I found myself thinking about some stuff I’ve not really given a lot of deep thought recently so here’s the first of a couple of posts on Resiliency when applied to non cloudy solutions.
I recall numerous occasions where I would design a highly resilient solution that would provide as many 9’s as I’d dare to commit to only for the proposal to be back on my desk some weeks later with the words too expensive where can we save money on it?
The main reason I would come up across this is that I would ask the Business Sponsor two key questions related to Disaster recovery as part of the business Impact analysis:
What is their RTO? Recovery Time Objective – The duration of time from point of failure within which a business process must be restored after a disaster or disruption.
What is their RPO? Recovery Point Objective – Acceptable amount of data loss measured in time
These two questions would inevitably elicit the response no data loss and restore service as quickly as possible, so I would go off and design a platform to get as close to those unattainable desires.
I soon learnt that a different approach was needed to prevent the paper bounce game. Eventually coming up with gold, silver and bronze Resiliency options as part of the platform proposal . We would include what we felt were viable Resiliency options graded according to arbitrary RTO & RPO levels and the associated costs to deliver on those.
This at least then meant the Business sponsor had somewhere to start from in terms of a monetary value and could see that wanting the Gold standard was going to cost them loads more than if they really thought it through . This got them thinking about what they really wanted in terms of RTO and RPO and we would then discuss what options were open to them.
For example does the accounting system need to be restored within 2 hrs with less than half an hour’s loss of data? In my experience it’s only at month end processing this level of recovery is really needed. During the majority of the month data is held elsewhere so can be easily recreated, the months accounts file is still open so this isn’t an issue in terms of processing. Is anyone working on the system at weekends? This is the sort of thought process that the Business needs to go through when requesting new systems and trying to figure out what they want in terms of resiliency.
When faced with stark questions about RTO’s and RPO’s the natural response is I want my system to be totally resilient with no downtime they have no idea what this means in terms of resource and thus potential cost so why not save time by giving them some options . You may be lucky and one of the options will be an exact fit but if not the Sponsor has an idea of what it costs to provide their all singing all dancing requirements. It’s more likely they will say something like “The Silver option may do but we need an enhanced level of support at month end ” or maybe something like “we need to make sure we have a valid tested backup from the previous night at month end “.
The beauty of this approach is that straight away you’ve engaged them in conversation and prevented a paper bounce . It’s not a new approach and one service teams are used to when dealing with external suppliers but there is no reason not to use the same methods with internal requests.