Most of you who come here know that I post A LOT about the #vCommunity. What you may not know is that I actually have a day job. Who would have thought? In between being a dad, and a husband, and a VMUG Leader in NYC, I’m also a Solutions Engineer for Zerto. I’ve been in this role for over a year and I love it. I really enjoy speaking to customers and learning different and innovative ways of doing things.
As part of this totally awesome gig, I get to talk to a lot of customers and prospective customers about their disaster recovery (DR) and business continuity (BC) plans and approaches. One of my favorite questions to ask is this:
“How many people went into IT to become a DR admin?”
I usually get crickets. Not because it’s a bad thing to be in BC/DR (I make a great living from it), but because it’s not a sexy job. It’s usually a task that gets dumped in your lap for legal or compliance reasons and it takes you away from the things that you WANT to do. It usually involved getting a whole bunch of different teams (Virtualization, Storage, Networking, DBAs, App/Dev, etc) involved and spending a few weeks (usually a few months) preparing for a test that is almost always done during a (holiday) weekend. Who the hell wants to work on the weekend? I sure don’t, that’s why I made the move to the vendor side but that’s a whole other story.
There’s one other component that I haven’t mentioned yet. Runbooks. Ugh. Just the thought of those things make me cringe. Who remembers or still uses those huge loose-leaf binders with hundreds of pages of step by step instructions that were written (and probably not updated) years ago. Once a year you would have to dust them off for instructions on how to recover your environment in the event of a disaster. Then you would have to go page by page with a bunch of other team members and hope that the system matches what is on the page.
You know what is really helpful with this kind of situation? The simple acronym RTFM. I come from the military and this acronym had a very simple meaning
READ
THE
F*ING
MANUAL
That however, is the old RTFM.
Since working at Zerto, I’ve come up with a new meaning.
.
.
.
.
.
.
.
RECOVER
TEST
FAILOVER
MOVE
These are some of the essential functions that any IT Resiliency Platform should be able to provide you. By performing these functions, you’ll ensure that your workloads are protected, your data is intact, and your processes are valid. Let’s take a quick look at each of these functions:
RECOVER
This is the ability to restore your data. It could be restoring, or as we like to say resuming your VMs or applications. Or it could mean restoring files or folders from a point in time before a disruption.
TEST
Testing is probably one of the most important but also most overlooked operations when it comes to IT Resilience. Testing is how you know with great confidence that your systems will work when you attempt to get them back up and running. It’s a way to recover your VMs or applications in practice before having to do the real thing.
FAILOVER
Failover is a misleading term. This is actually recovering your VMs or applications at the target site. Think of this as initiating your DR plan in a live scenario. If your production site becomes unavailable for whatever reason, this is how you recover your workloads and make your users happy again. Simply put, when you’re down, get yourself back up and running.
MOVE
Zerto has a function called Move VPG which provides you with Application Mobility by migrating a Virtual Protection Group (VPG) to another location. (NOTE: A VPG is comprised of the VMs you are protecting) This could be moving to another storage platform, or another datacenter, moving from one hypervisor to another or even moving to, from or between cloud providers.
In order to have a complete IT Resilience platform, I believe you need to be able to perform all of these functions simply and consistently. Stay tuned as I will dive into each operation a bit more and how Zerto specifically performs each function.