As I mentioned in my recent Cloud Backups post, I'm trying out a few virtualisation backup products to help me out with a prototype infrastructure that I've been working on. I want to store a backup of the various VMs that I've setup outside the infrastructure that I've setup – effectively offsite.
By happy circumstance, PHD Virtual had a beta running for version 6.2 of their backup product that includes “CloudHook”. It's a module that enables integration with cloud storage providers for the purposes of backup, archiving and disaster recovery. The 6.2 release covers the backup aspect, and future releases will add in archiving and DR functionality. Thanks to Patrick Redknap, I managed to hop onto the beta and try it out. (Note that the screenshots below come from a beta release and may have changed for GA.)
PHD's Virtual Backup product is delivered as a Virtual Backup Appliance. I was initially wary of production services running on dedicated virtual appliances a few years ago but I've changed my view over time and I now really like using them. (That's probably a subject for a different post though.) I won't go through the mechanics of the installation in nauseating detail, but basically it breaks down to the following high-level steps:
- Download and unzip the virtual appliance
- Use the vSphere Client to import and deploy the appliance (requires 8Gb disk space, 1 vCPU, 1Gb Memory and connection to 1 Port Group in it's default configuration)
- Open the VM's console and enter some network information
- Reboot the appliance
- Install the PHD Virtual Backup Client
Configuring the appliance for use is pretty straightforward although if, like I was, you have to make multiple hops to get to your data center (RDP over RDP over VPN for complicated reasons that I can't go into), you might find that the PHD Virtual client doesn't play too nicely with a lack of screen space. I could only just get to the “Save” button. (Granted, it's an unlikely situation to be in though.) The minimum required is to connect the appliance to vCenter (see the General tab of the Configuration section):
Normally at this point you'd expect to have to configure some disk space local to the backup appliance (or network storage space). Well, you still do really but you actually have a choice to make; where do you want to backup to?
PHD Virtual's approach to cloud backup is to offer backup directly to cloud storage, if you want. The current providers (certainly as of the beta) are Amazon S3, Google Cloud Storage, Swift / OpenStack based providers or RackSpace CloudFiles. Or you can use NFS / CIFS or a local disk. All of this is selectable from the Backup Storage tab. For the purposes of my testing I've used on Amazon S3.
Configuring the storage location is reasonably easy (assuming you know your way around S3 to dig out the information required). PHD have created a document to help you get started if you're new to cloud storage: Getting Started with Cloud Storage.
The Access Key ID and Secret Access Key fields come straight from Amazon. The Passphrase is, as it suggests, for encrypting your backups (AES 256bit). Hopefully it goes without saying, use more than the six characters that I did! The other important thing is not to forget or lose this Passphrase. Although it is stored in the appliance's configuration, if you lose the appliance somehow then without a record of the Passphrase your backups become just meaningless data.
I found a minor niggle with the tab index between the “Encryption Passphrase” and “Confirm Passphrase” fields but otherwise no problems. Except, note the message at the bottom: “Cloud storage requires Write Space to be configured for local caching.”
Writing backups straight to the cloud is best done with some form of local caching in place, and PHD have included that technology in CloudHook. Local caching is important because even with the nippiest internet connection, it can take some time to push your data to the cloud. During that time, snapshots will be in use on the VMs being backed up to create a consistent backup and it's not a good idea to keep them going any longer than necessary. Also, when it comes to restores, a location is needed to stage the files required from the cloud storage provider before the restore action can be completed.
So, before enabling cloud backups, PHD Virtual Backup requires you to create a Write Space using locally attached virtual disks. My interpretation is that it works a bit like this:
- PHD's appliance reads the data for a protected VM (although it won't actually be getting directly from the VM it'll be using a vSphere API)
- Backup data is cached to the Write Space
- Cached backup data is pulled from the Write Space to be transferred to the cloud storage provider
- Encrypted data is written to the cloud storage provider
As long as you configure enough write space, the data it stores could then be used for instant recoveries (as I later found out). PHD recommend configuring write space to be at least 2.5% of the total VM data that you are protecting. I opted to go with a much higher figure as I intended to add VMs gradually over time although I could just as easily have grown the space dynamically to meet my needs.
Write Space configuration is under a separate tab. A configuration window can be opened to add unused local disks:
The disk, when added, results in a reassuring tick back on the Write Space tab:
And now, it's possible to configure Amazon S3 as a backup storage location:
The appliance will ask to reboot after this configuration is made. If you've made a mistake somewhere, an error will be displayed in the client. Otherwise, you're good to configure a backup job!
The Backup panel is the place to do that. (Again, my screen size caused some issues with trying to get to the “Next” button. I had to hide my taskbar to see it.) I configured a job to backup two servers to start off with (a SQL DB server and its associated application server) as follows:
- Select VMs.
- Select backup appliance (I only have one).
- Mode – here I used “Virtual Full Backup” (it's recommended for cloud backups over “Full / Incremental” because incremental backups would require the appliance to have access to previous backups and that's not very bandwidth efficient).
- Scheduling options are “Now”, “Once”, “Daily” or “Weekly” with a few sub-options for each – I chose a daily backup to start at 00:30 GMT.
- Options – Here the job name is specified along with some other options (including making the backup an archive). Presumably verification takes place before the backup is sent to the cloud. I haven't had a chance to test that so I left it off (default).
- VM Processing Actions allow applications to be quiesced if PHD Guest Tools are installed. I haven't tried that out.
A summary is then usefully displayed. The data size is 80Gb (2 x 40Gb VMs):
Job scheduled! Lazily, I started it a few minutes early and had a look at the job details to see what was happening:
At the time it was doing pretty much as I expected – progress was slightly slower to start off with as it tackled the most populated parts of the VM first. So I left it to run, went to bed and checked it out in the morning…
My 80Gb of VMs backed up to Amazon S3 in a little over 4 hours with a deduplication ratio of 6:1 for one VM and 3:1 for the other. The job completed with warnings although they turned out to be related to the fact that no backups had been completed for the VMs before.
Subsequent backups took only 2 – 19 minutes to complete depending on how much data had changed and had dedupe ratios of about 1022:1. Looking in the Amazon S3 bucket that I created, you can see 100s of files now present:
What about restores? Well, that works too. I just hit “Instant VM Recovery” in the client program, selected my VM and restore options and hit go:
It took only 2 minutes. Now that's not bad! Since I was backing up an 80Gb VM with 500Gb of cache available, of course the VM was cached locally and I doubt the backup appliance had to get any data from Amazon S3. Unfortunately I have had enough time to test the restore functionality when the backup that I needed wasn't locally cached. I can imagine that the performance would be proportional to the how well it uploaded it in the first place.
I've run several more jobs, restores etc, and added more VMs over time. I can't say how well it scales as I've only had a small infrastructure to test this with. I'm happy with the performance so far, it has certainly met my expectations, exceeded them even. Despite only requiring 2.5% of total VM size for the Write Space, I'd personally suggest throwing a fair bit of storage space at this if you have it. It's great to have your backups stored offsite in case of a crisis but in the case of an issue, your Return to Operation time (RTO) could be greatly reduced by having a local cache of your recent backups.
PHD do recommend running a normal backup to disk in tandem with a backup to the cloud, it would be best if you do not to let the backup windows overlap though.
Over time I'd expect PHD might offer some more cloud storage options and maybe the ability to use more than one at a time. I imagine that this release will appeal to quite a number of businesses though. I'll be very interested to see what enhancements occur in subsequent releases.
More generally – since I haven't really used PHD's product for a little while – it's come along nicely. The integration of the CloudHook is pretty clean. My only grumble would be that I don't want to install client software wherever I want to administer the backup appliance from. I'd like to see a full web interface sometime in the future although this would probably have an effect on the resource requirements of the appliance.
Kudos to PHD though, I've enjoyed kicking the tires! If you want to learn more about CloudHook then the best way is to download a trial version. You can also find out about it from one of their webinars.
Disclaimer: This review was sponsored by PHD Virtual. Words and opinions are my own.