Migrating BlueStore’s block.db

Ceph’s BlueStore storage engine is rather new, so the big wave of migrations because of failing block devices is still ahead – on the other hand, non-optimum device selection because of missing experience or “heritage environments” may have left you with a setup you’d rather like to change.

Such an issue can be the location of the OSDs’ RocksDB devices. As a recap: BlueStore allows you to separate storage for the write-ahead log (WAL), its meta-data storage (RocksDB) and the actual content. When using spinning disks for content, the most common case is probably to split off RocksDB onto some SSD. If you have the money, you may have put the WAL onto NVME storage, but if not, it’ll automatically end up on the SSD (if you have it) or on the main block device, if you only have that.

So when setting up that “HDD, plus RocksDB on SSD” OSD, you’ll have had to decide on how to set up the RocksDB block device. As 10 GB RocksDB per Terabyte of main storage is recommended, assigning a full SSD is a waste of resources. You end up with basically two options: Partition the SSD, or turn it into a PV and create a LVM volume group from it. But whatever you decide: Once set up, there’s no documented way to move the RocksDB to a different block device – you’d need to recreate the OSD. Continue reading

Posted in BlueStore, Ceph | Leave a comment

Resetting an existing BlueStore OSD

During an attempt to migrate some OSDs’ BlueStore RocksDB to a different block device, we noticed (previously undetected) fatal read errors on the existing RocksDB. The only way to recover from this situation is to remove the OSD and rebuild its content from the other copies.

There are standard procedures to delete and to create OSDs, BlueStore and FileStore. But during our transition from FileStore to BlueStore, we came across a problem where we could not specify the new OSD’s id and had other minor difficulties. And we now wanted to cause the least data movement possible. All this while replacing the RocksDB block device.

To make a long story short: We were looking for a “mkfs”-style approach. Continue reading

Posted in BlueStore, Ceph | Leave a comment

Highly available database: MariaDB with Galera in a two-node cluster

This article originates from our work with OpenStack, but Galera also can be used as a standalone solution for highly available databases, of course. It doesn’t seem that difficult to set it up reading the OpenStack HA-Guide, but of course there are some things to be aware of to get it up and running. Continue reading

Posted in High Availability, OpenStack | Tagged , , , , | Leave a comment