In my previous post in this series on resilience of non cloudy solutions I discussed how to approach obtaining exactly what was acceptable to the business to achieve an appropriate DR solution . In this post I will look at a fairly high level at how to exploit AWS to help provide a cost effective solution for DR when your solution does not actually use AWS resources and is probably not designed in a decoupled manner that would make it easy to deploy to the cloud .
yes I know know I can’t help it the cloud is here after all
Please note that by necessity I’ve needed to keep to a high level as if I were to attempt to start exploring the detailed configuration options I’d still be writing this post by Christmas. Needless to say this post just scratches the surface but hopefully provides some food for thought.
You will or should have have local resilience in your solution consisting of multiple application servers and web servers , clustered database servers and load balancers .
The easiest DR solution to implement but the most costly is to replicate this albeit with maybe not so many servers and perhaps a single Data base server instance to an alternative physical location and putting in place processes to replicate data across to the 2nd location .
This typical configuration will look something like this:
There are plenty of variations on this but in the end it entails physically maintaining a distinct location which replicates the application architecture and associated security controls . Resources need to be in place to support that location; keep the components updated regularly and all the usual best practises need to be acted upon to validate the solution . It’s no point finding out the solution doesn’t’ work when you need it.
At this point you should hopefully be thinking that is a lot of investment for something that will only be rarely used . So here’s where AWS can help keep those costs down.
The first model which I’ve called the ‘halfway house’ may be an option for those who are unable to make use of the full AWS resources available and for whatever reason are unable or unwilling to store their data there . It still requires two maintained DC’s but saves costs by having the application and web servers for resilience being AWS instances. the cool thing here is that those resilient servers/instances are not actually operational unless needed ( you would have prepped AMI’s and hopefully use them in conjunction with a configuration management tool to ensure they are fully up to date when launched) . You will not have have the over head associated with watering & feeding them that you would have if you were 100% responsible for the infrastructure. The core AWS components that make this work are: EC2,VPC and ELB . If you wanted there is also the potential to use Route 53 to manage the DNS aspects that are needed for routing externally .There are issues with this model though such as the possibility of a lack of capacity when you need to spin up those instances ( although the use of Multiple AZ and regions should over come that fear), the over head associated with managing 3 sets of resources,latency issues just to name three that come to mind.
The ‘halfway house’ will look something like this:
Making use of AWS VPC means that you can create virtual networks built upon the AWS infrastructure which provides you with a great range of networking configurations for example in the diagram above I’ve show two group of instances, one that is externally accessible and another set that is basically an extension of your private LAN. there are far too many scenarios possible with just these features of AWS and obviously every application is different ( See why I made sure this post was kept at a high level)
The nirvana though to really seeing the costs tumbling is to get rid of DC 2 and use AWS as the Recovery site. as a bonus it can be used for those extra processing needs as well on a demand basis . This not only reduces the support over head, saves cost as you are no longer committed to paying for a second location with all the associated kit necessary to make it a viable alternative site , but it also provides a wide variety of failover and recovery options that you just won’t get when you have to commit to infrastructure up front ( hopefully that pre-empts the question about why not a private cloud – you need your own platform).
This model which I’ve called the ‘Big Kahuna’ can look a little like this :
With the ‘Big Kahuna’ you should make use of any of the AWS resources available. In the flavour above I’m using S3 to store regular snapshots / transaction logs etc from my primary database. Why not replicate directly? Well s3 is cheap storage and in the scenario I’m illustrating as an example my RTO and RPO values allow some delay between failure and recovery that I can reconstruct the database when needed from the data stored in my s3 bucket . Regular reconstruction exercise should occur though as part of the regular validation of the failover processes. AMI’s and a configuration management solution ( As it’s me it will be chef) are used to provision up to date application and web servers. Use is made of Route 53 to facilitate DNS management and Where I need to ensure that traffic is kept internal I’m making use of VPC .
The introduction of RDS for oracle means it is viable to use AWS as the failover solution for enterprises. There may be concerns over performance but this is a DR situation so if you are not in a position to reengineer for the cloud then when discussing with internal business sponsors discussions about reduced performance should be part of the business impact discussions.
AWS has services such as dedicated instances which may be the only way your security and networking guys will allow you to exploit AWS resources but you would need to do your sums to see if it makes sense to do so. Personally I’d focus on trying to understand the ‘reasons’ for this . There are a number of valid areas this would be required but I suspect cost isn’t really going to be any sort of driving force there.
The devil is in the detail when designing a failover solution utilising AWS as part of your DR . If you are planning for a new solution make sure you talk to the Software architect about the best practises when designing for the cloud it’s still applicable for on premise solutions too .
Data is really where all the pain points are and will likely dictate the model and ultimate configuration.
If you are trying to retro fit for an existing solution then the options open to you may not be that many and it’s likely you will have to start off with some form of the ‘halfway house’
Also don’t forget you can just try stuff out at minimal cost. Wondering if a particular scenario would work just try it out as you can just delete everything after you’ve tried it.
The cost effectiveness of the solution is directly related to the use you make of AWS resources to effect the solution. I even have a graph to illustrate ( @jamessaull would be proud of me) .
This graph is based on very rough comparative costs from starting off with no AWS resources as in the first situation I started discussing and working my way down through to the ‘Big Kahuna’. You can easily do your own sums .AWS pricing is on their site they even provide you with a calculator and you know how much it costs for those servers, licences, networking hardware,hardware maintenance costs support etc.