You can plan, look at diagrams, listen to experts, but you still won’t know for sure that all your client recovery capabilities will actually work. Why – because it is complicated. So many things can go wrong – network configurations do not get replicated properly, application dependencies are not in sync across sites, DR resources may be insufficient, etc. And the worst time to identify a recovery issue is during an actual emergency.
The only way to be sure you can recover your client’s resources is to test – period. Testing is so important that you must include this as part of your data protection service and use it as a positive differentiator from other MSPs.
If testing is so important why don’t MSPs test as much as they should?
- Testing takes time – MSP technicians are already overwhelmed with the day to day tasks of onboarding, managing, and dealing with client tickets.
- Testing can disrupt production systems – Many test procedures require that you bring down production in order to validate the test. If traditional restore processes are used, this downtime process can be very impactful, so much so, that organizations can feel the need to cut corners in the process, which leads to false positives, high risk, and wasted time.
- Testing can cost the MSP money –DR testing can create charges for things like public cloud compute cycles, 3rd party recovery services, or consultants that have to be included.
- Undocumented DR plan – While most organizations will claim to have a documented DR plan, most are not regularly updated, dependent on a few key individuals, on a spreadsheet or Word Doc, and are difficult to locate. Testing will more than likely create the need to update the plan, requiring more time be spent on something everyone hopes will never be used.
- Not enough MSP Technicians – Technicians are your most valuable asset. They are responsible for both the quality and quantity of services you offer. Technicians are often the face of your business and everyone wants to talk to them if there is an issue. And there are not enough of them to have them running extensive test routines that usually find nothing.
Even with all the challenges above, testing provides a major benefit for MSPs. Testing and the resulting reports, allow you to demonstrate to your clients that you are actively working on their behalf, even though nothing has gone wrong. A report confirming a successful DR test increases your customer satisfaction and their stickiness.
What Type of Testing Should MSPs Use?
Some backup and recovery tools that claim to include testing as one of their features, but you should be aware there are many types, with fewer or greater benefits. Here is a quick stack of some approaches to testing (roughly from the least effective to the most thorough):
- Data verification – this test just checks that blocks / files are good after they have been backed up. Needless to say this level of testing does nothing to ensure the applications can be functionally recovered.
- Database mounting – verifies a database has basic functionality within backups
- Single machine boot verification – verifies that a single server can be rebooted after a downtime event. This does little for more complex environments.
- Single machine boot with screenshot verification – this test goes a little past boot verification by sending an image of the operating system splash screen to administrators as proof the system can be recovered. A splash screen doesn’t ensure the anything will actually function as a system affected by ransomware will boot but not run.
- DR Runbook testing – multiple machines are spun up for testing. This is especially important for multiple servers that deliver a business service together, such as an ERP system or clustered databases.
- Recovery Assurance – This is the highest level of testing as it includes multiple machines, deep application testing, SLA assessment, and analytics as to the reason any recovery failed.
Anything other than Recovery Assurance still leaves questions about the ability for a full recovery. Recovery Assurance automatically produces documents that show you can deliver your committed Recovery Time and Recovery Point Objectives. The best part of Recovery Assurance is that it is totally automated requiring no active involvement from your valuable technicians until an issue is discovered. Recovery Assurance testing is one way to maximize the performance of your technician while increasing your value to clients.
This is just one of the values the industry-leading Unitrends Recovery Series backup and recovery solution can deliver to maximize the value of MSP technicians. To read more about the types and advantages of recovery testing read our White Paper here.