Obstacles for OpenStack: cinder-volume tears down control node

I’d like to share another finding from my work with OpenStack.
I was asked for assistance in a small private cloud based on Ocata with a single control node and a handful of compute nodes, and Ceph as storage backend.

During tests with a Heat template (containing 6 instances, 7 volumes and a small network infrastructure) the control node became unresponsive due to the load cinder-volume caused. The reason was that some of the volumes had to be created from (large) images, in which case Cinder has to convert the Glance images and upload them back to Ceph as volumes.

The conversion happens on the local disk of the control node, which was known and therefore the directory /var/lib/cinder was a separate logical volume with enough disk space. This was a suitable setup for the creation of single volumes, but this was the first real performance test for this environment, and it failed! While the stack was created – which took ages! – the control node was almost inoperable.

So we decided to put the conversion directory on a SSD. Not only lead this to a much faster stack creation but it also kept the control node “alive” and responsive.

This should be considered while planning a cloud infrastructure, although it can be fixed quickly if you have an empty slot available in your server. Until there’s a way for qemu-convert to avoid this workaround it’s quite a good idea to source out the conversion directory onto a faster device.

This entry was posted in OpenStack and tagged , , , . Bookmark the permalink.

Leave a Reply