Cephadm: change public network

One question that comes up regularly in the ceph-users mailing list is how to change the ceph (public) network in a cluster deployed by cephadm. I wrote an article a few years ago when cephadm was first introduced that focused on changing the monitor’s ip addresses, but I didn’t address the entire network. So now I will.

To keep the article brief I will not paste all of the terminal output here. This is supposed to be a guide not step-by-step instructions. And for the sake of simplicity, I’ll cover only the ceph “public_network”, not the “cluster_network”. There are several possible scenarios involving the “public_network”, but I will just cover one specific scenario: moving the entire cluster to a different data center, which involves completely shutting down the cluster. Parts of this procedure can be used in disaster recovery situations, for example where two out of three monitors are broken and the surviving one needs to be started with a modified monmap to be able to form a quorum.

Disclaimer: The following steps have been executed in a lab environment. They worked for me but they might not work for you. If anything goes wrong while you try to reproduce the following procedure it’s not my fault, but yours. And it’s yours to fix it!

The Ceph version used in these tests was Reef 18.2.1.

Create backups of all relevant information such as keyrings, config files and a current monmap. Stop the cluster, and then prevent the daemons from starting by disabling the ceph.target.

Perform the maintenance procedure (e. g. move servers to a different location) and power on the servers. Now change the network setup (IP addresses, NTP, etc.) according to your requirements.

Now it’s getting serious… The next steps contain some more details to have a bit of context and working examples. In this procedure, the “old network” has addresses of the form 10.10.10.0/24 and the “new network” has addresses of the form 192.168.160.0/24.

# Enter shell of first MON
reef1:~ # cephadm shell --name mon.reef1

# Extract current monmap
[ceph: root@reef1 /]# ceph-mon -i reef1 --extract-monmap monmap

# Print content
[ceph: root@reef1 /]# monmaptool --print monmap
monmaptool: monmap file monmap
epoch 5
fsid 2851404a-d09a-11ee-9aaa-fa163e2de51a
last_changed 2024-02-21T09:32:18.292040+0000
created 2024-02-21T09:18:27.136371+0000
min_mon_release 18 (reef)
election_strategy: 1
0: [v2:10.10.10.11:3300/0,v1:10.10.10.11:6789/0] mon.reef1
1: [v2:10.10.10.12:3300/0,v1:10.10.10.12:6789/0] mon.reef2
2: [v2:10.10.10.13:3300/0,v1:10.10.10.13:6789/0] mon.reef3

# Remove MONs with old address
[ceph: root@reef1 /]# monmaptool --rm reef1 --rm reef2 --rm reef3 monmap

# Add MONs with new address
[ceph: root@reef1 /]# monmaptool --addv reef1 [v2:192.168.160.11:3300/0,v1:192.168.160.11:6789/0] --addv reef2 [v2:192.168.160.12:3300/0,v1:192.168.160.12:6789/0] --addv reef3 [v2:192.168.160.13:3300/0,v1:192.168.160.13:6789/0] monmap

# Verify changes
[ceph: root@reef1 /]# monmaptool --print monmap
monmaptool: monmap file monmap
epoch 5
fsid 2851404a-d09a-11ee-9aaa-fa163e2de51a
last_changed 2024-02-21T09:32:18.292040+0000
created 2024-02-21T09:18:27.136371+0000
min_mon_release 18 (reef)
election_strategy: 1
0: [v2:192.168.160.11:3300/0,v1:192.168.160.11:6789/0] mon.reef1
1: [v2:192.168.160.12:3300/0,v1:192.168.160.12:6789/0] mon.reef2
2: [v2:192.168.160.13:3300/0,v1:192.168.160.13:6789/0] mon.reef3

# Inject new monmap
[ceph: root@reef1 /]# ceph-mon -i reef1 --inject-monmap monmap

Repeat this procedure for the remaining monitors. Keep in mind that their ceph.conf (/var/lib/ceph/{FSID}/mon.{MON}/config) still refers to the old network. Update those files accordingly, then start the monitors. If everything went well they should connect to each other and form a quorum.

Now the ceph public_network needs to be updated:

ceph config set mon public_network 192.168.160.0/24

Update the config files of the MGRs as well (/var/lib/ceph/{FSID}/mgr.{mgr}/config) and start them. Now you should have the orchestrator available again to deal with the OSDs (and other daemons), but it will still try to connect to the old network since the host list still contains the old addresses. Update the host addresses with:

ceph orch host set-addr reef1 192.168.160.11
ceph orch host set-addr reef2 192.168.160.12
ceph orch host set-addr reef3 192.168.160.13

It can take a few minutes for the orchestrator to connect to each host. Eventually you should reconfigure the OSDs so their config files are updated automatically:

ceph orch reconfig osd

To verify, check the config files of one or more OSDs (/var/lib/ceph/{FSID}/osd.{OSD_ID}/config), if for some reason they are not updated automatically, you can do that manually.

Now the OSDs should be able to start successfully and eventually recover. Monitor the ceph status carefully, for example if you didn’t catch all OSDs and some of them still have the old address configured, you would see that in the osd dump output:

ceph osd dump | grep "10\.10\.10"

If that is the case, modify their config file if necessary and try to restart the affected OSDs. Unset the noout flag and test if the cluster works as expected. And don’t forget to enable the ceph.target so the daemons start automatically after the next reboot.

I repeated this procedure a couple of times back and forth and it did work for me every time. Of course, there’s no guarantee that it will work for you, you might encounter issues which don’t come up in a virtual environment. So plan carefully and if possible, test the procedure in a test environment first.

If there’s anything important missing here or if I made a mistake, feel free to comment!

This entry was posted in Ceph, cephadm and tagged , , . Bookmark the permalink.

Leave a Reply