We needed a way to provide VM level image backups purely for disaster recovery. Our KVM hosts have local disk, so offloading a qcow2 file seemed the best way. I found a script online [1] and started digging.
Few roadblocks at first. Centos 6 ships with qemu-0.12 where ‘qemu-img create’ doesn’t accept a snapshot name, and the ‘qemu-img convert’ can’t pull it back out. Little bit of compilation later, i had a shiny 2.1.2 qemu-img (the only thing the script needed differently to run.)
Looked great at first, ramped up with 3-4 VM guests. Nightly runs, blah blah.
Then a few days later the badness happened.
Mar 5 12:15:49 host.sdsc.edu kernel: EXT4-fs error (device vda2): ext4_lookup: deleted inode referenced: 1178554 Mar 5 12:15:49 host.sdsc.edu kernel: EXT4-fs error (device vda2): ext4_lookup: deleted inode referenced: 1177551 Mar 5 12:15:49 host.sdsc.edu sshd[31024]: pam_env(sshd:setcred): Unable to open config file: /etc/security/pam_env.conf: Input/output error Mar 5 12:17:32 host.sdsc.edu kernel: EXT4-fs error (device vda2): ext4_lookup: deleted inode referenced: 1177463
Only one of the guests was having a problem, but on top of that the ‘qemu-img convert’ snapshot images were bad too. No amount of fsck’ing could fix those suckers.
The other guinea pig KVM guests were relatively idle, but this system was a postgres server. Firing up bonnie++ to the localdisk on a “working” guest caused it to break similarly.
I poked around for a solution that didn’t require installing non-standard installs on the host and found
virsh snapshot-create-as --quiesce
Whipped up a script revolving around that and…
It requires a few extra tweaks that need to be included. It does require qemu-ga, the QEMU Guest Agent, be installed/running on the guest and the guest XML definition to properly make a new device:
<channel type='pty'> <target type='virtio' name='org.qemu.guest_agent.0'/> <address type='virtio-serial' controller='0' bus='0' port='1'/> </channel>
So far i’ve given this a proper run with multiple bonnie++ running and not had any problems with either the running base, a reboot on a rebased image, a copied off “old” base. And it even appears that leaving off the –quiesce doesn’t cause a break either. This is handy when you haven’t installed ‘qemu-ga’ and/or can’t restart the guest with modified XML definitions.
I find this a much better solution so far, but it’s not without limitations.
You must have sufficient disk space to hold two working copies of a guest (a limitation of libvirt-0.10). The new rebased running image and the old base. Depending on how you copy-out the old one, that may be the only space you need. With later versions of libvirt you should be able to use the ‘virsh blockcommit’ to shove the snapshot delta back into the old base. We’re stuck with ‘virsh blockcopy’ that makes a new base.
When we get our Centos 7 KVM guest up and going, i will see how things are different with a ‘modern’ libvirt/qemu pair.
References
[1] : http://www.sleepdontexist.com/2014/03/28/kvm_manage-sh-a-script-to-manage-your-kvm-machines/