This article originates from our work with OpenStack, but Galera also can be used as a standalone solution for highly available databases, of course. It doesn’t seem that difficult to set it up reading the OpenStack HA-Guide, but of course there are some things to be aware of to get it up and running.
Galera without HAProxy
Before I explain the setup steps for Galera configuration I wanted to make the OpenStack Admins/Users aware of a probably misleading information in the HA-Guide. In the Configuration tips section there are three different deployment strategies for Galera described, one of them states:
Galera can be configured using one of the following strategies: Each instance has its own IP address: OpenStack services are configured with the list of these IP addresses so they can select one of the addresses from those available.
I know this is true for Memcached, but I have been searching quite a while to find something that confirms this statement for database connections in the OpenStack config files. I filed a bug report for the docs to find out if the described scenario is even possible, but there has not been any reaction yet.
Anyway, let’s get back to the topic: Galera and MariaDB. So here are the steps I performed on an openSUSE Leap-42.3 system in a two-node-cluster.
Tie-breaker
If you don’t have the resources for a third cluster node (or you simply decide to use only two) you will need an arbitrator as a tie-breaker in case one of the two database nodes fails, without quorum your database won’t be accessible. I decided to use an existing hardware server that has some unused resources and installed the required packages, basically it’s galera-3 with dependencies. And this package contains what we need: garbd. It can be started either from the shell or as a systemd service, we use the latter. The configuration is really quite simple:
tiebreaker:~ # cat /etc/sysconfig/garb # A space-separated list of node addresses (address[:port]) in the cluster: GALERA_NODES="10.0.0.1:4567 10.0.0.2:4567" # Galera cluster name, should be the same as on the rest of the node. GALERA_GROUP="galera1" # Log file for garbd. Optional, by default logs to syslog LOG_FILE="/var/log/garbd.log"
Make sure this file exists and has appropriate ownership:
tiebreaker:~ # ll /var/log/garbd.log -rw-r--r-- 1 garb root 18042 Mar 16 16:01 /var/log/garbd.log
That’s it, actually! We’ll start the service later when the cluster is alive, it doesn’t make too much sense now since there is no tie to break. ;-)
Galera configuration
Currently, the official repositories of openSUSE don’t contain the required packages, I had to install them from experimental stage from this repo, just in case you wonder where to get them.
I’ll just paste my galera configuration without too much explanation since it depends on your individual environment and your use case. From SUSE repo you get a separate config file /etc/my.cnf.d/50-galera.cnf where you can configure the Galera specific options:
controller01:~ # egrep -ve "^#|^$" /etc/my.cnf.d/50-galera.cnf [mysqld] binlog_format=ROW default-storage-engine=innodb innodb_autoinc_lock_mode=2 innodb_flush_log_at_trx_commit=0 innodb_buffer_pool_size=122M bind-address=10.0.0.1 wsrep_on=ON wsrep_provider=/usr/lib64/galera-3/libgalera_smm.so wsrep_provider_options="pc.recovery=TRUE;gcache.size=300M" wsrep_cluster_name="galera1" wsrep_cluster_address="gcomm://" wsrep_node_name="Galera01" wsrep_node_address="10.0.0.1" wsrep_sst_method=rsync
At first, I tried this option
wsrep_cluster_address="gcomm://10.0.0.1,10.0.0.2,10.0.0.3"
as stated in the HA Guide, but that didn’t start my cluster, I’m not sure why, though. Searching for help I found the workaround, where they didn’t provide any cluster addresses at initial stage. But this was not the only thing preventing the cluster to start: on my system the documented command didn’t work:
controller01:~ # systemctl start mariadb --wsrep-new-cluster systemctl: Unbekannte Option »--wsrep-new-cluster«
Digging further brought this up. Basically it’s about a change since MariaDB-10.1 where you have to execute /usr/bin/galera_new_cluster
instead of the --wsrep-new-cluster
option, this command also starts mysql for you. After that you should see a lot of output in /var/log/mysql/mysqld.log initializing your Galera cluster.
Now you can start mysql on the second node and watch the database sync if you already had data in it. I had copied a set of databases to controller01 before starting Galera to see if the replication works, turns out it does. :-)
Let’s check the current status, we should see a cluster_size of 2:
MariaDB [(none)]> SHOW STATUS LIKE 'wsrep_cluster_size'; +--------------------+-------+ | Variable_name | Value | +--------------------+-------+ | wsrep_cluster_size | 2 | +--------------------+-------+ 1 row in set (0.00 sec)
Great!
The last part is the arbitrator which we already have configured above. Now just fire it up and see if it joins the cluster:
tiebreaker:~ # systemctl start garb.service
To see the current status of your cluster, login to your database client and execute:
MariaDB [(none)]> SHOW STATUS LIKE 'wsrep_cluster_size'; +--------------------+-------+ | Variable_name | Value | +--------------------+-------+ | wsrep_cluster_size | 3 | +--------------------+-------+ 1 row in set (0.00 sec)
So if we rely on this output – what other choice do we have ;-) – the arbitrator also joined our cluster and the replication works, too. Just as a quick overview I compared the size of /var/lib/mysql:
controller01:~ # du -hs /var/lib/mysql 724M /var/lib/mysql controller02:~ # du -hs /var/lib/mysql 724M /var/lib/mysql
If you are worrying about garbd on a third node that already has data stored in /var/lib/mysql – maybe because it already serves as a database for other purposes as in our case – don’t be, it does not overwrite your existing mysql directory, it’s just
"a member of the cluster that participates in voting, but not in the actual replication."
If you are looking for monitor options for Galera take a look at the website.
I haven’t tested the different scenarios yet, but this will be necessary before using it in a production environment. So as soon as I have any test results I’ll write another article and report my findings.
Please comment if I have left anything unclear or made a mistake somewhere, I’ll try to clear that up.