Obviously this isn’t an all-encompassing check list for Disaster Recovery (DR) planning, but it should provide a high level starting point.
Availability, or uptime, are priorities for all businesses big and small. The keys to which are understanding and planning. -One important point to bear in mind, you don’t need to have all the information right away, and you don’t need to be overly granular straight away, that will come as the plan evolves.
Understanding: This is an entirely business orientated issue in the first instance.
- Understand the business topology (departments, sub departments, etc)
- Understand the way they work, and the tools they rely on.
- From here you should be able to state which departments are the most important, and how long the business can live without these delivery areas – this helps to define your RPO’s and RTO’s.
At this point you can start your plan. As I’ve already mentioned, don’t get hung up on the details too early, IT is nothing more than a business enabler and as such the technology is arbitrary. Understand what the business does, and what it need to do as a bare minimum to keep operating.
Planning: Once you’ve established the business delivery area’s and their respective recovery objectives you can start to dig into the nitty gritty:
- Where are these key services located?
- How are they accessed? (this can play a large part, you need to make delivery of those tools & services as simple and cost effective as possible)
- How often are they backed up?
- Where & how are the backups stored? (are you meeting the 3-2-1 rule)
- It’s important at this point to engage with your product vendors (either through your internal IT department or trusted IT partner) to understand what data is required to recover those respective services back into an operational state.
At this point you should have an understanding of where you are and where you need to be, and you can start putting together a Disaster Recovery plan, this should consist of:
- Business Topology and the associated tools/services
- RTO & RPO’s
- Possible DR Scenarios (don’t get hung up on this, it’s impossible to predict every scenario, focus on the high level: specific localised failures, loss of the business premises, etc.)
- Where the services will be located/recovered to
- Vendor Contacts
- Staff Contacts
- DR Co-ordinators (who within your business will be responsible for co-ordinating the recovery, vendors and staff).
Now here comes a very important step: TESTING. Test your plan, recover your services in an isolated environment and review. Was it successful? Was it within your Recovery Objectives? If the answer to any of these is NO then you need to work with your vendors & IT partners to understand why and identify changes to address this.
It’s important to note that your DR process(s) is an evolving document. Every time there’s a change to the business, be it strategic or technical, look at how this affects the businesses Recovery Objectives and the DR Plan.
Every businesses DR plan will be unique to them, but as a general rule there are two area’s which can simplify the process and drive down Recovery Times:
- Native High Availability – ensure your local resources have high availability and are fault tolerant (where possible). Mitigating the start of a disaster is far less costly than dealing with one! Look at moving your services into hosted offerings, Office 365 is a great example, Microsoft have spent billions making a highly available and fault tolerant infrastructure to provide it’s services. Piggy back on this and you can significantly mitigate the risks and impact of hardware and site failure.
As part of investigating this avenue with your prospective vendor, ask about their uptime SLA’s and DR plans, make sure you have confidence that they can meet their promises and keep your business running. It’s also worth looking at having a vendor agnostic backup of your data, at the end of the day it’s your data and keeping a copy in an independent location will help you achieve the 3-2-1 rule and protect your business
- Implement An Availability Solution Not A Backup Solution – for those of you who read my previous blog you’ll know I have a bit of a bee in my bonnet about backups. A backup is a copy of your data stored separately from the production copy of your data. For day to day file recovery that’s great but what about loss of whole services or hardware? Where are you going to restore your failed service(s) to? An availability solution should provide you with not only the backup mechanism but a place with which to recover to, be it local or in the cloud. It also goes some way to making your DR plan a lot simpler as you already have the recovery process in mind as well as a console to orchestrate from. Good backups will be a by-product of a good availability solution.