We’re approaching the finale of this little blog series. In the previous post I described how to make Galera, RabbitMQ and Memcached highly available. This article will be about the HA configuration of the OpenStack services, this blog series will then conclude with the migration process itself and what to be aware of in the last part (V).
Although the OpenStack HA Guide is outdated and incomplete it describes the key aspects of highly available services quite well, at least as a starting point. I will not go into detail and repeat what is already written there, I’ll just stick to our specific setup.
Basically, you have to decide whether you want an active/passive (a/p) or active/active (a/a) configuration. For a/p services this would mean that one of our two control nodes would be in standby mode and only be used in case of a failure, which results in many unused resources. We decided to configure an a/a setup so both control nodes are actually in use.
The stateless services such as
cinder-scheduler can be made highly available quite easy, just launch multiple instances and load balance them.
Stateful services on the other hand have to be considered more closely because a single action typically involves more than one request. Examples for stateful services are
I already covered the database (Galera) and messaging service (RabbitMQ) in the previous post, the following sections are only about the core OpenStack services. We decided to setup a
HAProxy as load balancer for these services:
# Keystone listen keystone-admin bind 0.0.0.0:35357 mode http option tcpka option httplog option forwardfor server controller01 10.0.0.1:5501 check inter 2000 rise 5 fall 2 server controller02 10.0.0.2:5501 check inter 2000 rise 5 fall 2 listen keystone-service bind 0.0.0.0:5000 mode http [...] server controller01 10.0.0.1:5500 check inter 2000 rise 5 fall 2 server controller02 10.0.0.2:5500 check inter 2000 rise 5 fall 2 # Glance listen glance-api bind 0.0.0.0:9292 mode http [...] # Placement listen placement bind 0.0.0.0:8778 mode http [...] # Nova listen nova-api bind 0.0.0.0:8774 mode http [...] server controller01 10.0.0.1:5503 check inter 2000 rise 5 fall 2 server controller02 10.0.0.2:5503 check inter 2000 rise 5 fall 2 listen nova-metadata bind 10.0.0.100:8775 mode http [...] server controller01 10.0.0.1:5504 check inter 2000 rise 5 fall 2 server controller02 10.0.0.2:5504 check inter 2000 rise 5 fall 2 listen nova-novncproxy bind 0.0.0.0:6080 mode tcp [...] # Cinder listen cinder-api bind 0.0.0.0:8776 mode http [...] # Horizon listen horizon bind 0.0.0.0:5580 mode http [...] # Neutron listen neutron-server bind 0.0.0.0:9696 mode http [...]
As you can see we simply defined some custom ports for Nova, Glance etc. and let HAProxy load balance it. I left out all the repetitive lines where just the port changes. Those custom ports have to be configured in the nova.conf, cinder.conf etc., here is an excerpt from the nova.conf on controller01:
# nova.conf [DEFAULT] enabled_apis = osapi_compute,metadata my_ip = 10.0.0.1 osapi_compute_listen = $my_ip osapi_compute_listen_port = 5503 osapi_compute_workers = 4 metadata_listen = $my_ip metadata_listen_port = 5504
To not overload this post I’ll just paste the pacemaker resource for Nova, this should be applicable to the other services:
# Nova primitive nova-api systemd:openstack-nova-api \ op start timeout=180 interval=0 \ op stop timeout=180 interval=0 \ op monitor timeout=100 interval=60 primitive nova-conductor systemd:openstack-nova-conductor \ op start timeout=120 interval=0 \ op stop timeout=120 interval=0 \ op monitor timeout=100 interval=60 primitive nova-novncproxy systemd:openstack-nova-novncproxy \ op start timeout=120 interval=0 \ op stop timeout=120 interval=0 \ op monitor timeout=100 interval=60 primitive nova-scheduler systemd:openstack-nova-scheduler \ op start timeout=120 interval=0 \ op stop timeout=120 interval=0 \ op monitor timeout=100 interval=60 clone cl-nova-api nova-api \ meta target-role=started clone cl-nova-conductor nova-conductor \ meta target-role=started clone cl-nova-novncproxy nova-novncproxy \ meta target-role=started clone cl-nova-scheduler nova-scheduler \ meta target-role=started
Since cinder-volume is a stateful service it is not a cloned resource in pacemaker but just a primitive resource and is not running on both control nodes, just on the one holding the virtual IP.
The configured port bindings for HAProxy have to be reflected in the
openstack endpoint list, of course, here are the nova endpoints:
controller01:~ # openstack endpoint list | grep nova | 8188ad61b99b4a968b2585c20757033f | RegionOne | nova | compute | True | public | http://controller.domain:8774/v2/%(tenant_id)s | | e6b8cd92304443cf9e2e2a60ea3bc496 | RegionOne | nova | compute | True | internal | http://controller.domain:8774/v2/%(tenant_id)s | | ebd2e112dcc74ef480466d0a2d5e33f5 | RegionOne | nova | compute | True | admin | http://controller.domain:8774/v2/%(tenant_id)s |
I’m aware that the keystone port
35357 is deprecated and only port
5000 should be used, but I decided to keep it for now to reduce the number of potential issues during migration. I haven’t had the time yet to change and test it after the migration.
I can’t stress enough how important it was to test the most important failure scenarios before migrating instances to the new environment. For example, we noticed that trying to stop some of the services like
neutron-server took longer than the default timeout in the resource definition. The result was that pacemaker STONITH’ed one controller just because I wanted to stop a (stateless) service! That’s why we have some custom timeout values for each service.
As I already mentioned I won’t paste all of our pacemaker configuration, that’s too specific and has to be considered independently for each setup. Just keep in mind that there are some more things to configure such as STONITH, location and colocation constraints, order constraints to define the startup and shutdown order of services, a virtual IP resource and everything else you require in your setup.
These were the key aspects to take into consideration for a pacemaker configuration for the OpenStack services. There’s only one blog post left in this series, it will be published soon. Feel free to leave a comment if you have any questions or remarks about the described procedure.