The previous post in this little series was about environment preparation to install the latest operating system and OpenStack version but preserve the database in order to migrate an existing Cloud environment to a newer platform. I mainly focused on the database as the most critical component.
This post will be about Galera, RabbitMQ and Memcached to continue the environment preparation. The next article will describe the high availability setup of the OpenStack services and then I’ll conclude with a last article dealing with the required steps for the actual migration.
Highly available database (Galera)
To provide a fail-safe database setup we decided to go with Galera since we already had a virtual SUSE OpenStack Cloud that we used as a template for some of the necessary configuration, and they also use Galera in their setup. To keep this series readable I’ll spare most details and try to focus on the main aspects.
Galera would also work with two nodes only, but to avoid split-brain scenarios it’s recommended to have a third tiebreaker node that only runs the garbd
service. We chose a different hardware machine in our environment for this task. During my work with OpenStack I already wrote an article about Galera and tiebreakers so I’ll skip most of the content.
The rest is pretty straight forward, I’ll describe the executed steps in a large code block. All these steps have to be performed on both controller nodes. Please note that in our setup this is all done automatically via Salt so we didn’t actually need to run zypper in mariadb
etc., it’s just for visibility.
For the sake of simplicity the control nodes are called controller01
and controller02
, with their IP addresses 10.0.0.1
and 10.0.0.2
. The virtual control node is just called controller
with 10.0.0.100
.
# Install packages controller01:~ # zypper in mariadb-client mariadb mariadb-galera \ python3-PyMySQL galera-3-wsrep-provider galera-python-clustercheck # Galera configuration # Some non-default values (especially max_connections) controller01:~ # cat /etc/my.cnf.d/74-galera-tuning.cnf [mysqld] innodb_buffer_pool_size = 256M innodb_log_file_size = 64M innodb_buffer_pool_instances = 1 innodb_log_file_size = 64M innodb_flush_method = O_DIRECT innodb_flush_log_at_trx_commit = 1 max_connections = 2048 tmp_table_size = 64M max_heap_table_size = 64M skip_name_resolve = 1 # Regular galera.conf controller01:~ # cat /etc/my.cnf.d/75-galera-custom.cnf [mysqld] wsrep_on = ON wsrep_provider = /usr/lib64/galera-3/libgalera_smm.so wsrep_cluster_name="NewCloudGalera" wsrep_cluster_address = "gcomm://controller01,controller02" wsrep_provider_options = "gmcast.listen_addr = tcp://10.0.0.1:4567; \ gcs.fc_limit = 5; gcs.fc_factor = 0.8;" wsrep_slave_threads = 1 wsrep_max_ws_rows = 0 wsrep_max_ws_size = 2147483647 wsrep_debug = 0 binlog_format = ROW default_storage_engine = InnoDB innodb_autoinc_lock_mode = 2 innodb_doublewrite = 1 query_cache_size = 0 query_cache_type = 0 expire_logs_days = 10 user = mysql datadir = /var/lib/mysql tmpdir = /var/lib/mysqltmp bind-address = 10.0.0.1
To allow pacemaker to monitor the highly available database the galera-python-clustercheck
also needs to be configured. Make sure to grant database access for the monitoring user.
controller01:~ # cat /etc/galera-python-clustercheck/my.cnf [client] user=monitoring password=**** host=10.0.0.1 # Additional options can be specified (defaults are commented) controller01:~ # cat /etc/galera-python-clustercheck/galera-python-clustercheck.conf [...] # -p PORT, --port=PORT Port to listen on [default: 8000] # -6, --ipv6 Listen to ipv6 only (disabled ipv4) [default: False] # -4 IPV4, --ipv4=IPV4 Listen to ipv4 on this address [default: 0.0.0.0] GALERA_PYTHON_CLUSTERCHECK_OPTIONS="--conf=/etc/galera-python-clustercheck/my.cnf" controller01:~ # cat /etc/haproxy/haproxy.cfg # Galera haproxy configuration # Port 8000 is for the clustercheck listen galera bind 10.0.0.100:3306 mode tcp stick-table type ip size 1000 stick on dst option httpchk option clitcpka maxconn 2038 default-server port 8000 server controller01 10.0.0.1:3306 check inter 2000 fastinter 1000 rise 5 fall 2 backup on-marked-down shutdown-sessions server controller02 10.0.0.2:3306 check inter 2000 fastinter 1000 rise 5 fall 2 backup on-marked-down shutdown-sessions
These settings have to be made on all controller nodes.
The respective pacemaker resource is configured as follows (FQDN is masked here):
# Galera (MariaDB cluster) primitive galera galera \ params check_user=monitoring check_passwd=**** datadir="/var/lib/mysql" \ enable_creation=true log="/var/log/mysql/mysqld.log" socket="/var/run/mysql/mysql.sock" \ wsrep_cluster_address="gcomm://controller01,controller02" cluster_host_map="controller01:controller01.domain;controller02:controller02.domain" \ op demote interval=0 timeout=600s \ op monitor interval=23s \ op monitor interval=20s role=Master \ op promote interval=0 timeout=600s \ op start interval=0 timeout=120s \ op stop interval=0 timeout=120s # Galera is a multi-state resource ms ms-galera galera \ meta clone-max=3 interleave=false master-max=3 notify=true ordered=false target-role=Started is-managed=true # Clustercheck primitive galera-python-clustercheck systemd:galera-python-clustercheck \ op monitor interval=10s
To successfully bootstrap and initialize Galera we came up with the following command order. Again, these steps were developed manually but in the end they were all executed in a specified order by Salt. The reason for this “strange” procedure is that we needed to initialize MariaDB with our root password, create the databases, grant access etc., but this was not possible as soon as the pacemaker cluster was built. But since we can’t install all control nodes entirely simultaneously (one node initializes the cluster, the other(s) simply join) the Galera bootstrap would fail. That’s why we split up these steps before the actual Galera bootstrap.
# Workaround to initialize mysql without conflicting with galera sed -i 's/wsrep_cluster_address = "gcomm:\/\/controller01,controller02"/wsrep_cluster_address = "gcomm:\/\/"/' /etc/my.cnf.d/75-galera-custom.cnf # Bootstrap galera galera_new_cluster # Set root password, drop test db, disable remote access mysql -u root << EOF UPDATE mysql.user SET Password=PASSWORD('$PASSWORD') WHERE User='root'; DELETE FROM mysql.user WHERE User=''; DELETE FROM mysql.user WHERE User='root' AND Host NOT IN ('localhost', '127.0.0.1', '::1'); DROP DATABASE IF EXISTS test; DELETE FROM mysql.db WHERE Db='test' OR Db='test\\_%'; FLUSH PRIVILEGES; EOF # Create databases mysql -u root -p$PASSWORD << EOF CREATE DATABASE keystone; CREATE DATABASE glance; CREATE DATABASE placement; CREATE DATABASE nova_api; CREATE DATABASE nova; CREATE DATABASE nova_cell0; CREATE DATABASE cinder; CREATE DATABASE neutron; GRANT PROCESS, SELECT ON *.* TO 'monitoring'@'localhost' IDENTIFIED BY '****'; GRANT PROCESS, SELECT ON *.* TO 'monitoring'@'%' IDENTIFIED BY '****'; GRANT ALL PRIVILEGES ON keystone.* TO 'keystone'@'localhost' IDENTIFIED BY '****'; GRANT ALL PRIVILEGES ON keystone.* TO 'keystone'@'%' IDENTIFIED BY '****'; [...] # Repeat for all databases FLUSH PRIVILEGES; EOF # stop mysql and undo config changes systemctl stop mariadb.service sed -i 's/wsrep_cluster_address = "gcomm:\/\/"/wsrep_cluster_address = "gcomm:\/\/controller01,controller02"/' /etc/my.cnf.d/75-galera-custom.cnf
After this step the pacemaker resource could be started (after the second node had joined the cluster) and Galera would bootstrap a new cluster.
RabbitMQ
The messaging service creates its own cluster so luckily there was not much to configure. We just needed to make sure to provide a nodename
and a working epmd
configuration:
controller01:~ # cat /etc/rabbitmq/rabbitmq-env.conf NODENAME=rabbit@controller01 controller01:~ # cat /etc/systemd/system/epmd.socket.d/ports.conf [Socket] ListenStream=10.0.0.1:4369 FreeBind=true
The pacemaker resource is also defined as a multi-state resource:
# RabbitMQ primitive rabbitmq ocf:rabbitmq:rabbitmq-server-ha \ params default_vhost=openstack erlang_cookie=XX... pid_file="/var/run/rabbitmq/pid" \ policy_file="/etc/rabbitmq/ocf-promote" rmq_feature_health_check=true rmq_feature_local_list_queues=true \ meta failure-timeout=30s migration-threshold=10 resource-stickiness=100 \ op demote interval=0 timeout=120s \ op monitor interval=30s \ op monitor interval=27s role=Master \ op notify interval=0 timeout=180s \ op promote interval=0 timeout=120s \ op start interval=0 timeout=360s \ op stop interval=0 timeout=120s primitive rabbitmq-port-blocker ocf:pacemaker:ClusterMon \ params extra_options="-E /usr/bin/rabbitmq-alert-handler.sh --watch-fencing" \ op monitor interval=10s \ meta target-role=started ms ms-rabbitmq rabbitmq \ meta clone-max=3 interleave=false master-max=1 master-node-max=1 notify=true ordered=false target-role=started
If the rabbit cluster would start successfully it could be configured for our openstack environment:
controller01:~ # rabbitmqctl set_cluster_name rabbit@TRAIN controller01:~ # rabbitmqctl add_vhost openstack controller01:~ # rabbitmqctl set_policy --vhost openstack --priority 0 --apply-to queues ha-queues '^(?!amq.).*' '{ \"ha-mode\": \"exactly\", \"ha-params\": 2}'" controller01:~ # rabbitmqctl add_user openstack PASSWORD && rabbitmqctl set_user_tags openstack management controller01:~ # rabbitmqctl set_permissions --vhost openstack openstack ".*" ".*" ".*" # Check status controller01:~ # rabbitmqctl cluster_status Cluster status of node rabbit@controller01 ... [{nodes,[{disc,[rabbit@controller01,rabbit@controller02]}]}, {running_nodes,[rabbit@controller02,rabbit@controller01]}, {cluster_name,<<"rabbit@TRAIN">>}, {partitions,[]}, {alarms,[{rabbit@controller02,[]},{rabbit@controller01,[]}]}]
Now we had a rabbit cluster and the OpenStack services were able to communicate with each other (we hoped) if the transport_url
was updated correctly in all the configuration files, here’s an excerpt from nova.conf
:
controller01:~ # grep transport_url /etc/nova/nova.conf.d/100-custom.conf transport_url = rabbit://openstack:****@controller01.domain:5672,openstack:****@controller02.domain:5672/openstack
Memcached
The last part of this post is a rather short one. Memcached caches authentication tokens from the identity service (Keystone), without it the other services won’t be able to authenticate against Keystone.
We configured memcached to listen to both localhost and the primary IP address because during our tests we found error messages leading to this conclusion. We didn’t put more effort into the investigation and simply continued with this configuration:
controller01:~ # grep -Ev "^$|^#" /etc/sysconfig/memcached MEMCACHED_PARAMS="-U 0 -m 64 -l 127.0.0.1,10.0.0.1 -p 11211 -c 4096" MEMCACHED_USER="memcached" MEMCACHED_GROUP="memcached"
Usually, the OpenStack services are configured with a list of memcached servers so we tried this as documented:
[keystone_authtoken] memcached_servers = controller01:11211,controller02:11211
But in case controller01 goes down the second control node does not have the cached token available so the client will be unauthorized and a new request will have to be made. The result would be the same for virtual IP as well. So we decided to only provide the localhost as memcached server:
[keystone_authtoken] memcached_servers = localhost:11211
This setting has worked quite well for us during several failover scenarios and it still does now that we are in production with the new Cloud environment.
If you have any comment or questions about this setup please let me know!