I have another experience to share with Stackers that use Ceph as their storage backend. In this environment Ceph is used for Glance, Nova and Cinder. I won’t get into configuration details, I assume you’re familiar with this kind of setup. So let’s dive into the issue.
This is another article about OpenStack with Ceph as the storage backend. Like my other posts about this topic this is not about how to install and configure your private cloud but it’s more a collection of obstacles you could be facing. For me it’s also an online documentation in case I forgot what I did weeks, months or even years ago.
Now let’s get to it. We’ve been working with Ceph for quite a while now, it’s really comfortable launching instances within seconds. But from time to time we noticed that some instances took several minutes to boot, and there was nothing obvious happening on the compute nodes or in the Ceph cluster. So we didn’t really bother to debug it further, it’s not too bad if you have to wait one minute or so for a 6 GB instance to start.
Working with Ceph and OpenStack can make your life as a cloud administrator really easy, but sometimes you discover its downsides. From time to time I share some findings in this blog, it’s a nice documentation for me and hopefully it helps you preventing the same mistakes I did.
I discovered an orphaned instance in a user’s project, fortunately it was not an important one. The instance’s disk was not a volume but a clone from the Glance image (<INSTANCE_ID>_disk), so it depended on that base image. Only there was no base image in the backend anymore, somehow it must have been deleted even though there were existing clones. I assume it had to do with a cache tier incident a couple of months earlier, something must have destroyed the relationship between the image and its clones.