For quite some time I had live-migration on my mind but never really had the time to dig into this deep enough to get it working. But a couple of weeks ago this topic came up again, we were planning on updating and upgrading almost our whole environment, including Ceph and OpenStack (production environment running with ‘Mitaka’).
I’ll spare the details on Ceph upgrade, that was pretty straight forward, OpenStack not so much. At least not in our cloud because we had a mixed setup of control and compute nodes on different releases running with ‘Mitaka’ and of course active instances in production use.
And the most time consuming issue was to migrate all the instances to an already upgraded compute node to be able to upgrade the next, all without live-migration. So we decided to at least try getting live-migration to work, and we did! There are countless websites and blogs describing how to achieve that, one of them actually helped us (here), basically it was about analyzing error messages and trying to figure out how to get past them.
This is what we did on the compute nodes:
We had to change the uri_default value in /etc/libvirt/libvirt.conf, because if you just configure the live_migration_uri in nova.conf, you could end up with a working live migration but without being able to do a simple
virsh listcommand, hence the “localhost” string.
compute1:~ # egrep -ve "^#|^$" /etc/libvirt/libvirt.conf | grep uri_default uri_default = "qemu+ssh://localhost/system"
Configure authentication: set auth_tcp to “none”, otherwise a passwordless login to the remote host won’t be possible. These are the only active config options we are using:
compute1:~ # egrep -ve "^#|^$" /etc/libvirt/libvirtd.conf listen_tcp = 1 tcp_port = "16509" auth_tcp = "none"
The live_migration_uri overrides the default libvirt uri and replaces the “%s” by the destination host. The option disable_libvirt_livesnapshot is to enable live snapshots by nova so the instances are not paused during a snapshot. It’s not mandatory to enable live-migration, just a side note.
compute1:~ # egrep -ve "^#|^$" /etc/nova/nova.conf | grep live live_migration_uri = "qemu+ssh://%s/system" disable_libvirt_livesnapshot = false
- Shared storage
For a successful live-migration it is mandatory that all hosts are allowed to access the same storage. Since we already use Ceph as storage backend, this was quite easy to accomplish with a CephFS directory mounted on the compute nodes. The next steps describe how to mount the CephFS root directory and create the desired directory layout to be mounted by the compute nodes.
# mount CephFS root directory (on any ceph client) compute1:~ # mount -t ceph ceph-host1:/ /mnt/ name=admin,secretfile=admin.key,_netdev,noatime # create directory structure for instances compute1:~ # mkdir -p /mnt/cephfs/openstack/nova-instances # unmount CephFS compute1:~ # umount /mnt # mount CephFS on compute node compute1:~ # grep ceph /etc/fstab ceph-host1,ceph-host2,ceph-host3:/cephfs/openstack/nova-instances /var/lib/nova/instances ceph name=admin,secretfile=admin.key,_netdev,noatime 0 0 # Note that ceph assumes default port 6789 # since we use that we don't specify a port in fstab # make sure nova is allowed to write compute1:~ # chown -R nova.nova /var/lib/nova/instances/
If you are converting an existing compute node from using its local file system to use shared storage you should set that node into maintenance mode, evacuate all instances, stop nova-compute service and remove everything under /var/lib/nova/instances before mounting a shared directory:
# maintenance mode control1:~ # nova service-disable --reason maintenance compute1 nova-compute # evacuate instances (or manually, if evacuate doesn't work) control1:~ # nova host-evacuate # stop nova-compute compute1:~ # systemctl stop openstack-nova-compute.service # remove clean directory compute1:~ # rm -rf /var/lib/nova/instances/* # mount shared directory compute1:~ # mount /var/lib/nova/instances # restart nova-compute compute1:~ # systemctl start openstack-nova-compute.service
To create the ceph key file just copy the string from admin.keyring:
compute1:~ # cat /etc/ceph/ceph.client.admin.keyring [client.admin] key = AQCj2YpRiAe6CxAA7/ETt7Hcl9IyxyYciVs47w== compute1:~ # cat admin.key AQCj2YpRiAe6CxAA7/ETt7Hcl9IyxyYciVs47w==
More information on mounting CephFS can be found in the docs.
To summarize the most important things:
- Configure shared storage on all compute nodes, e.g. CephFS, NFS or other backends.
- Configure libvirt to be able to connect passwordless to other compute nodes but also be able to run local commands.
- Configure nova.conf with a valid live_migration_uri.
This was basically it. If I missed a step or something is unclear, please comment and I’ll try to fix it.