Note: There have been many improvements to ceph-volume and ceph-bluestore-tool to simplify the process of DB/WAL migration. A newer article describes the process for a cephadm managed cluster:
Hence this post can be considered outdated, but might still be relevant to understand what happens under the hood.
Ceph’s BlueStore storage engine is rather new, so the big wave of migrations because of failing block devices is still ahead – on the other hand, non-optimum device selection because of missing experience or “heritage environments” may have left you with a setup you’d rather like to change.
Such an issue can be the location of the OSDs’ RocksDB devices. As a recap: BlueStore allows you to separate storage for the write-ahead log (WAL), its meta-data storage (RocksDB) and the actual content. When using spinning disks for content, the most common case is probably to split off RocksDB onto some SSD. If you have the money, you may have put the WAL onto NVME storage, but if not, it’ll automatically end up on the SSD (if you have it) or on the main block device, if you only have that.
So when setting up that “HDD, plus RocksDB on SSD” OSD, you’ll have had to decide on how to set up the RocksDB block device. As 10 GB RocksDB per Terabyte of main storage is recommended, assigning a full SSD is a waste of resources. You end up with basically two options: Partition the SSD, or turn it into a PV and create a LVM volume group from it. But whatever you decide: Once set up, there’s no documented way to move the RocksDB to a different block device – you’d need to recreate the OSD. Continue reading