Overview

CenterGrid is a big proponent of the VMware suite of products; especially vCloud Director (vCD).  In fact, we use vCD to power CenterGrid Compass, our managed private cloud service.  vCD uses the RabbitMQ AMQP broker to pass messages between vCD’s extension services, object extensions, and for notifications.

Despite VMware providing installation documentation here to compliment the RabbitMQ installation documentation here, coupled with enough documentation to choke a horse, we found the installation and configuration process wasn’t very straight forward.  Furthermore, we have encountered a couple bugs that caused us no end of enjoyment.  By the end of this article, we hope to show you how our RabbitMQ cluster was installed, configured, and operationalized for enterprise use.

As is with everything on the Internet, it becomes old fast.  We used CentOS 8, but now that the entire CentOS line is being discontinued, we recommend you use something different when you build your own.  Certain parts of this article will remain relevant, especially those dealing with Keepalived.

What To Expect From This Document

This document is a collection of knowledge, links, and instructions. Some of it will read like a chat with a friend; and some of it like a tutorial. Prior to taking on this project, the author knew nothing about RabbitMQ, so please keep that in mind if some of the steps seem redundant or unnecessary. The goal is to set you up for success and help you stand up a 3-node RabbitMQ cluster with minimal strings attached. Names and IP addresses have been changed to protect the innocent.

Fonts and Styles: Code and Console Output

In this document you will find the following styles applied to code and console output.

Linux Command Line

$ ls -l

total 176
-rw-r--r--. 1 root root   683 Aug 19 09:59 0001.pcap
-rw-------. 1 root root  1586 Jul 31 02:17 anaconda-ks.cfg
drwxr-xr-x. 2 root root  4096 Jul 31 02:48 Desktop
drwxr-xr-x. 2 root root  4096 Jul 31 02:48 Documents

Windows Command Prompt (DOS)

DOS> ipconfig /all

PowerShell

PS> $PSVersionTable

Name                           Value
----                           -----
PSVersion                      5.1.18362.628
PSEdition                      Desktop
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0...}
BuildVersion                   10.0.18362.628
CLRVersion                     4.0.30319.42000
WSManStackVersion              3.0
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1

Installation

Prepare for the VMs

  1. Create DNS entries for these three VMs and one virtual IP (VIP)
  2. Log in to DNS01.domain.local
    • Open the DNS snappin
    • Create four new static DNS entries in domain.local
      • rabbitvip = A.A.A.10
      • rabbit1 = A.A.A.11
      • rabbit2 = A.A.A.12
      • rabbit3 = A.A.A.13
  3. Generate a complex password for “root” OS account to be used on all three VMs
  4. Generate a complex password for “localadmin” OS account to be used on all three VMs
  5. Generate a complex password for “vCloudDirectorAdmin” RabbitMQ account
  6. Generate a complex password for “monitoringuser”; to be used by your monitoring software
  7. Download the CentOS 8 minimal installer ISO
  8. Upload the CentOS 8 minimal installer ISO to a datastore in your vCD environment
    •  

Build The VMs

  1. Using vSphere (not vCD), create three VMs with these settings:
 Setting Value
 Compatibility ESXi 6.7 Update 2 and later
 Guest OS Linux, CentOS 8 (64-bit)
 CPU 2
 RAM 4
 Disk 50 GiB, Thin-provisioned
 SCSI VMware Paravirtual
 Network Your choice
 CD/DVD The CentOS 8 minimal installer ISO from steps #7 & 8
 Video Auto-detect settings
 Tools Upgrades Enabled
  1. Steps 11 to 15 must be performed on all three VMs
  2. During the installation, you will create the root account. Also take the time to add the “localadmin” account and assign it to the “wheel” group.
  3. Wait for the installation to finish
  4. Log in as root
  5. Disable system protections: Firewall & Security-Enhanced Linux (SELinux).  These will be re-enabled at a later step.
$ systemctl stop firewalld
$ setenforce 0
  1. Update OS and install prerequisite software
$ yum install -y epel-release
$ yum update -y
$ yum install open-vm-tools keepalived tcpdump erlang wget htop -y

Install and Configure RabbitMQ

  1. Steps 17 to 20 must be performed on all three VMs
  2. Download and install the RabbitMQ package
    • Instructions taken from here
    • Get the contents of /etc/yum.repos.d/rabbitmq_erlang.repo here
$ nano /etc/yum.repos.d/rabbitmq_erlang.repo

$ wget https://github.com/rabbitmq/rabbitmq-server/releases/download/v3.8.5/rabbitmq-server-3.8.5-1.el8.noarch.rpm $ rpm -import https://www.rabbitmq.com/rabbitmq-signing-key-public.asc $ yum install -y ./rabbitmq-server-3.8.5-1.el8.noarch.rpm

  1. Edit /etc/rabbitmq/rabbitmq.conf
    • Get the contents of /etc/rabbitmq/rabbitmq.conf here
    • Official rabbitmq.conf documentation is here
    • Example file is here
  2. Edit /etc/rabbitmq/rabbitmq-env.conf
    • Get the contents of /etc/rabbitmq/rabbitmq-env.conf here
  3. Enable and start the rabbitmq-server service
$ chkconfig rabbitmq-server on
$ systemctl start rabbitmq-server
$ rabbitmq-plugins enable rabbitmq_management
  1. Do this on VM rabbit1:
    • The last command will output a string. Copy that string and keep it for step 22.
$ rabbitmqctl set_cluster_name [email protected]
$ rabbitmqctl set_policy ha-all ".*" '{"ha-mode":"all","ha-sync-mode":"automatic","ha-promote-on-failure":"when-synced","ha-promote-on-shutdown":"when-synced","queue-mode":"lazy"}'
$ cat /var/lib/rabbitmq/.erlang.cookie
  1. Do this on VMs rabbit2 and 3:
    • Take the value from Step 21 and insert it into below
    $ echo '' > /var/lib/rabbitmq/.erlang.cookie
    $ rabbitmqctl stop_app
    $ rabbitmqctl reset
    $ rabbitmqctl start_app
    
    • After the last command executes, VMs 2 and 3 should join node 1 and form a cluster

Install and Configure Keepalived

  1. Steps 24 to 28 must be performed on all three VMs
  2. Find your NIC’s identifier. It might look like “ens19” or “eno1” or “enp3s0f0”.  Remember this value for the next step.
    • Perform these steps on rabbit1
$ ip addr show
  1. Edit /etc/keepalived/keepalived.conf
    • Perform these steps on rabbit1
    • In this file, change the current NIC name from “ens18” to the correct NIC name
    • Get the contents of /etc/keepalived/keepalived.conf here
$ nano /etc/keepalived/keepalived.conf
  1. Place the “keepalived-rabbitmq-healthcheck” scripts into /usr/libexec/keepalived/
    • Execute each nano command below, then paste in the contents
    • Get the content of each file in the Appendix
    • Save the file
    • Go to the next file in the list
$ nano /usr/libexec/keepalived/keepalived-rabbitmq-health-check-cluster.sh
$ nano /usr/libexec/keepalived/keepalived-rabbitmq-health-check-port-5671.sh
$ nano /usr/libexec/keepalived/keepalived-rabbitmq-health-check-port-5672.sh
$ nano /usr/libexec/keepalived/keepalived-rabbitmq-health-check-port-15672.sh
$ nano /usr/libexec/keepalived/keepalived-rabbitmq-health-check-port-25672.sh
$ nano /usr/libexec/keepalived/keepalived-rabbitmq-health-check-queue-master.sh
    • NOTES:
      • Keepalived will run the scripts as root because the OS user ‘keepalived_script’ does not exist. All rabbitmqctl and rabbitmq-diagnostic commands must be run as root, so there is no need for user ‘keepalived_script’.
      • For more information on why “/usr/libexec/keepalived” was chosen to host the scripts, please read this article
      • Each script only checks one port. This was by design to make the logs easier to read; now, when a health check script fails, you know exactly which port is to blame.
  1. Make these scripts executable by root
    • Perform these steps on each RabbitMQ VM
$ chmod 0744 /usr/libexec/keepalived/keepalived-rabbitmq-health-check-cluster.sh
$ chmod 0744 /usr/libexec/keepalived/keepalived-rabbitmq-health-check-port-5671.sh
$ chmod 0744 /usr/libexec/keepalived/keepalived-rabbitmq-health-check-port-5672.sh
$ chmod 0744 /usr/libexec/keepalived/keepalived-rabbitmq-health-check-port-15672.sh
$ chmod 0744 /usr/libexec/keepalived/keepalived-rabbitmq-health-check-port-25672.sh
$ chmod 0744 /usr/libexec/keepalived/keepalived-rabbitmq-health-check-queue-master.sh
  1. Enable and start the keepalived service
    • Perform these steps on each RabbitMQ VM
$ chkconfig keepalived on
$ systemctl start keepalived

Create User Accounts

  1. The Monitoring User
    • Perform these steps on each RabbitMQ VM
    • All of our servers are monitored. Skip this step if it does not apply.
    • Exists at the OS level only. Has zero rights inside RabbitMQ.
$ useradd monitoringuser
$ passwd monitoringuser
$ usermod -aG wheel monitoringuser
  1. The vCloud Director User
    • Perform these steps on rabbit1
    • Exists inside RabbitMQ only. Has zero rights at the OS level.
$ rabbitmqctl add_user vCloudDirectorAdmin
$ rabbitmqctl set_user_tags vCloudDirectorAdmin administrator
$ rabbitmqctl set_permissions -p / vCloudDirectorAdmin ".*" ".*" ".*"
  1. The Test User
    • Perform these steps on rabbit1
    • Exists inside RabbitMQ only. Has zero rights at the OS level.
    • This account has a weak password (12345678), but its simplicity makes it easy to construct the command line string when stress testing the RabbitMQ cluster
    • This account will be removed by the end of this document
$ rabbitmqctl add_user testuser 12345678
$ rabbitmqctl set_user_tags testuser administrator
$ rabbitmqctl set_permissions -p / testuser ".*" ".*" ".*"

Finish Installation and Configuration

  1. Enable system protections: Firewall & Security-Enhanced Linux (SELinux)
    • Perform these steps on each RabbitMQ VM
$ systemctl start firewalld
$ setenforce 1
  1. Add RabbitMQ firewall rules
    • Perform these steps on each RabbitMQ VM
    • This ruleset allows the RabbitMQ VMs to talk unrestricted to each other, but permits only AMQP and Management UI communications with outside clients
$ firewall-cmd --permanent --zone=public --add-rich-rule='rule family="ipv4" source address="A.A.A.10/32" port protocol="tcp" port="4000-65535" accept'
$ firewall-cmd --permanent --zone=public --add-rich-rule='rule family="ipv4" source address="A.A.A.11/32" port protocol="tcp" port="4000-65535" accept'
$ firewall-cmd --permanent --zone=public --add-rich-rule='rule family="ipv4" source address="A.A.A.12/32" port protocol="tcp" port="4000-65535" accept'
$ firewall-cmd --permanent --zone=public --add-rich-rule='rule family="ipv4" source address="A.A.A.13/32" port protocol="tcp" port="4000-65535" accept'
$ firewall-cmd --permanent --zone=public --add-rich-rule='rule family="ipv4" source address="0.0.0.0/0" port protocol="tcp" port="15672" accept'
$ firewall-cmd --permanent --zone=public --add-rich-rule='rule family="ipv4" source address="0.0.0.0/0" port protocol="tcp" port="5672" accept'
$ firewall-cmd –reload
  1. Add Keepalived firewall rules
    • Perform these steps on each RabbitMQ VM
    • Create a new firewall service definition file
    • Get the contents of /etc/firewalld/services/vrrp.xml here
$ nano /etc/firewalld/services/vrrp.xml
    • Add that service definition to the list of active rules
$ firewall-cmd --reload
$ firewall-cmd --permanent --zone=public --add-service=vrrp
$ firewall-cmd –-reload
  1. Install your antivirus client
  2. Install your monitoring agent
  3. At this point, keepalived should have placed the virtual IP on a single RabbitMQ VM, and the three VMs should have formed a cluster. However, if you are super paranoid, now would be a good time to reboot all three VMs.

Testing: Round One

  1. Test 1: Does the Keepalived Virtual IP address exist on only one RabbitMQ VM?
    • Open three SSH windows, one to each RabbitMQ VM
      • A.A.11
      • A.A.12
      • A.A.13
    • In each SSH window, type and run this:
$ watch 'ip addr show | grep “A.A.A.10”'
    • Arrange all three SSH windows so you can see them simultaneously
    • Only one of the three RabbitMQ VMs should give a response that looks like this:

“inet A.A.A.10/24 scope global secondary ens192”

    • If more than one window shows the virtual IP address, then your keepalived configuration is incorrect. Solve this problem before moving on.
  1. Test 2: Does the Keepalived virtual IP address move from VM to VM? (keepalived service restarted)
    • Open three additional SSH windows, one to each RabbitMQ VM
      • A.A.11
      • A.A.12
      • A.A.13
    • Find the RabbitMQ VM that is currently hosting the VIP (A.A.A.10)
    • Run this command in the new SSH window of the RabbitMQ VM that is hosting the VIP:
$ systemctl restart keepalived
    • Watch the monitoring SSH windows to ensure that that the VIP moves from server to server as the keepalived service is restarted
    • During the transition period, you might see two servers hosting the VIP, but after a few seconds only one of the three RabbitMQ VMs should give a response that looks like this:

“inet A.A.A.10/24 scope global secondary ens192”

  1. Test 3: Does the Keepalived Virtual IP address move from VM to VM? (rabbitmq-server restarted)
    • Find the RabbitMQ VM that is currently hosting the VIP (A.A.A.10)
    • Run this command in the new SSH window of the RabbitMQ VM that is hosting the VIP:
$ systemctl restart rabbitmq-server
    • Watch the monitoring SSH windows to ensure that that the VIP moves from server to server as the keepalived service is restarted
    • During the transition period, you might see two servers hosting the VIP, but after a few seconds only one of the three RabbitMQ VMs should give a response that looks like this:

“inet A.A.A.10/24 scope global secondary ens192”

  1. Test 4: Does the RabbitMQ VM rejoin the cluster? (rabbitmq-server restarted)
    • Perform these steps on your laptop
    • Open the RabbitMQ Management UI

http://A.A.A.10:15672

    • Navigate to Overview –> Nodes
    • Run this command in the new SSH window of the RabbitMQ VM that is hosting the VIP:
$ systemctl restart rabbitmq-server
    • Watch the web page for a node to go RED –> YELLOW –> GREEN
    • Wait for all three nodes to appear green before continuing
    • What you have just witnessed is a node being kicked out of the cluster and rejoining the cluster
    • Also, .10 is the VIP, so the keepalived services should be moving that IP address around, too
    • There should not be any lengthy service interruptions from the web page
  1. Test 5: Does the RabbitMQ VM rejoin the cluster? (reboots)
    • Perform these steps on your laptop
    • Open the RabbitMQ Management UI

http://A.A.A.10:15672

    • Navigate to Overview –> Nodes
    • Reboot a node
    • Watch the web page for a node to go RED –> YELLOW –> GREEN
    • Wait for all three nodes to appear green before continuing
    • What you have just witnessed is a node being kicked out of the cluster and rejoining the cluster
    • Also, .10 is the VIP, so the keepalived services should be moving that IP address around, too
    • There should not be any lengthy service interruptions from the web page
  1. Test 6: Load Testing
    • Perform these steps on your laptop
    • Download and install the Java Runtime Environment (JRE) on your laptop
    • Download and install the RabbitMQ Performance Test application on your laptop:

https://github.com/rabbitmq/rabbitmq-perf-test/releases

https://rabbitmq.github.io/rabbitmq-perf-test/stable/htmlsingle/#installation

    • Open a DOS window
    • In the DOS window, run the following command:
      • This will produce 9 producers, each sending at a rate of 20 messages/second
      • This will produce 9 consumers, each extracting at a rate of 20 messages/second
      • The command will run for 50 minutes
      • The carrots (^) allow you to break a DOS command across multiple lines
      • If you decide to put this command into a batch file (.bat), remember to double up on the percent signs: % –> %%
DOS> cd \bin
DOS> .\runjava com.rabbitmq.perf.PerfTest ^
--uris                        amqp://testuser:[email protected]:5672 ^
--time                        3000 ^
--flag                        persistent ^
--auto-delete                 false ^
--qos                         1000 ^
--confirm                     1000 ^
--confirm-timeout             -1 ^
--rate                        20 ^
--size                        1024 ^
--queue-pattern               'perf-test-%d' ^
--queue-pattern-from          1 ^
--queue-pattern-to            9 ^
--producers                   9 ^
--consumers                   9 ^
--consumer-latency            10000
    • While this command is running, revisit Tests 1-5 to explore the HA/Failover capabilities under load
    • While this command is running, start a SSH window to each of the three RabbitMQ VMs and run ‘htop’
      • Observe the CPU and memory utilization
      • Maxed out CPU or RAM is bad
      • To try a different rate of message delivery and consumption…
        1. Stop the DOS command
        2. Choose a new rate, or ‘producers’ count, or ‘consumers’ count
        3. Select and copy the whole command
        4. Paste it into the DOS window as one whole string
        5. Press ‘enter’ if necessary
  1. Minimize your testing windows. You’re going to use them again at the end of this document.

Configure HA in vSphere

  1. Create an anti-affinity rule
    • Perform these steps on your laptop
    • Log in to vSphere
    • Navigate to ‘Hosts and Clusters’ –>
    • Navigate to Configure tab –> Configuration –> ‘VM/Host Rules’ –> +Add
      • Name: anti-affinity-rule-RabbitMQ_Cluster
      • Type: Separate Virtual Machines
      • VMs:
        • rabbit1
        • rabbit2
        • rabbit3
    • Click OK to close and save
    • In the next step, we will completely shutdown the VM to take a snapshot. This will allow vSphere to place the VM on a different host to meet the anti-affinity rules.
  1. Take VM Snapshots
    • Perform these steps on each RabbitMQ VM
    • Log in via SSH (root)
    • Completely shut down the VM
$ shutdown --poweroff 0
    • From your laptop, navigate to vSphere –> VMs and Templates –> –> One of the RabbitMQ VMs
    • Actions –> Snapshots –> Take Snapshot
    • Name: <Change_Request_#_Here>
    • Description: Snapshot before <Change_Request_#_Here>.  Rollback point.
    • Repeat this for the other two VMs
    • Start the VMs

Create an SSL/TLS Certificate for the Cluster

NOTE:  We found generating the certificate to be rather tricky and made a mistake or two early on.  As a result, some of the inter-node authentications/verifications did not work as described in the RabbitMQ documentation.  Compromises were therefore made inside the inter_node_tls.config file.  If we were to go back and redeploy a RabbitMQ cluster using the certificate instructions contained here, then the compromises in the inter_node_tls.config file would not be needed.

  1. Generate a Certificate Signing Request (CSR)
    • Perform these steps on rabbit1
    • Log in to rabbit1 via SSH (root)
    • Edit OpenSSL.cnf configuration file before generating CSR
$ nano /etc/pki/tls/openssl.cnf
    • Find the following lines in openssl.cnf and change them

FROM: # req_extensions = v3_req # The extensions to add to a certificate request
TO:       req_extensions = v3_req # The extensions to add to a certificate request
FROM: countryName_default                  = XX
TO:       countryName_default                  = US

FROM: #stateOrProvinceName_default = Default Province
TO:       stateOrProvinceName_default  =

FROM: localityName_default         = Default City
TO:       localityName_default         =

FROM: 0.organizationName_default    = Default Company Ltd
TO:       0.organizationName_default    =

FIND:   commonName_max                  = 64
TO:       commonName_max                  = 64
TO:       commonName_default            = rabbitvip.domain.local

FIND:   keyUsage = nonRepudiation, digitalSignature, keyEncipherment
TO:       keyUsage = nonRepudiation, digitalSignature, keyEncipherment
TO:       subjectAltName = @alt_names
TO:
TO:       [ alt_names ]
TO:       DNS.1  = rabbitvip.domain.local
TO:       DNS.2  = rabbitvip
TO:       DNS.3  = rabbit1.domain.local
TO:       DNS.4  = rabbit1
TO:       DNS.5  = rabbit2.domain.local
TO:       DNS.6  = rabbit2
TO:       DNS.7  = rabbit3.domain.local
TO:       DNS.8  = rabbit3
TO:       IP.1  = A.A.A.10
TO:       IP.2  = A.A.A.11
TO:       IP.3  = A.A.A.12
TO:       IP.4  = A.A.A.13

      • Save and close OpenSSL.cnf
      • Generate a CSR
        • Do not provide a password when requested
$ mkdir ~/csr
$ cd ~/csr
$ openssl req -new -newkey rsa:2048 -keyout rabbitmqv2.key -out rabbitmqv2.csr -nodes -config /etc/pki/tls/openssl.cnf -extensions v3_req
  1. Prepare RabbitMQ Certificate Template
    • Log in to your Windows domain.local Certificate Authority
    • Open MMC.exe –> Add ‘Certification Authority’ snap-in
    • Expand the tree until you see ‘Certificate Templates’ –> Right-click –> Manage
    • Find template ‘Computer’ –> Right-click –> Duplicate Template –> New window appears
    • Edit tabs:
      • Tab: General
        • Template Display Name: “RabbitMQ Cluster”
        • Validity Period: “3 years”
      • Tab: Cryptography
        • Minimum Key Size: 2048
        • Providers: Microsoft RSA SChannel Cryptographic Provider
      • Tab: Extensions
        • Application Policies –> Edit –> ‘Client Authentication’ and ‘Server Authentication’
        • Key Usage –> Edit –> ‘Digital Signature’ and ‘Key Encipherment’ and ‘Allow Encryption of User Data’
      • Tab: Subject Name
        • Select ‘Supply in the request’
      • Click ‘OK’
    • Close ‘Certificate Templates Console’
    • Go back to the ‘Certification Authority’ snap-in
    • ‘Certificate Templates’ –> Right-click –> New –> ‘Certificate Templates to Issue’
    • Scroll down and find ‘RabbitMQ Cluster’ –> OK
    • Open PowerShell window
PS> Get-CATemplate
    • Verify one of the entries is ‘RabbitMQCluster’
  1. Upload the CSR to the Windows Certificate Authority
    • Connect to rabbit1 using WinSCP (root)
    • Download the CSR from /root/csr/rabbitmq.csr
    • Log in to the Certificate Authority (RDP)
    • Copy the CSR to the CA via RDP drag ‘n drop copy
  1. Create a Certificate from the CSR
    • On the Windows CA, open a DOS window
DOS> certreq -submit -attrib "CertificateTemplate:RabbitMQCluster"
    • The app will ask you to find and select the CSR file
    • The app will generate a certificate called ‘RabbitMQCluster.cer’
    • Shift-delete the CSR file
  1. Export the Root CA Public Certificate
    • On the Windows CA, open a DOS window
DOS> certutil -ca.cert domain.local.CA.cer
  1. Copy Certificates from Windows CA to Rabbit1
    • Copy the following certificate from the CA to your laptop
      • cer
      • local.CA.cer
    • Copy the following certificates from your laptop to Laptop to rabbit1 using WinSCP (/root/csr)
      • cer
      • local.CA.cer
    • On rabbit1, move certificate related files
$ mv /root/csr/* /etc/pki/tls/private/
$ cd /etc/pki/tls/private
  1. Translate the Certificates to PEM Format
    • Perform these steps on rabbit1
    • Convert RabbitMQCluster.cer to PEM format
    • Convert Certificate Authority chain to PEM format
    • Copy PEM certificates to new location
$ openssl x509 -in RabbitMQCluster.cer  -outform PEM -out rabbitmqcluster.pem
$ openssl x509 -inform DER -outform PEM -in domain.local.CA.cer -out domain.local.CA.cer.pem
$ cp ./rabbitmqcluster.pem             /etc/pki/tls/certs/
$ cp ./rabbitmq.key                    /etc/pki/tls/certs/
$ cp ./domain.local.CA.cer.pem     /etc/pki/tls/certs/
$ chmod 444 /etc/pki/tls/certs/*
  1. Copy the Certificate Files to the Other RabbitMQ VMs
    • Copy the following files from rabbit1 to Laptop using WinSCP
      • Source: /etc/pki/tls/certs/rabbitmqcluster.pem
      • Source: /etc/pki/tls/certs/rabbitmq.key
      • Source: /etc/pki/tls/certs/domain.local.CA.cer
    • Copy the following files from Laptop to rabbit2 using WinSCP
      • Destination: /etc/pki/tls/certs/rabbitmqcluster.pem
      • Destination: /etc/pki/tls/certs/rabbitmq.key
      • Destination: /etc/pki/tls/certs/domain.local.CA.cer
    • Copy the following files from Laptop to rabbit3 using WinSCP
      • Destination: /etc/pki/tls/certs/rabbitmqcluster.pem
      • Destination: /etc/pki/tls/certs/rabbitmq.key
      • Destination: /etc/pki/tls/certs/domain.local.CA.cer
    • Set permissions on the files on rabbit2 and rabbit3
$ chmod 444 /etc/pki/tls/certs/*

Enable Encrypted Communications

  1. Update RabbitMQ Configuration File to Only Use Encrypted Communications
    • Perform these steps on each RabbitMQ VM
    • Replace the contents of the RabbitMQ Configuration File with these
$ nano /etc/rabbitmq/rabbitmq.conf
    • Save and close the file
  1. Update Erlang VM Configuration to Use Encrypted Communications
    • Perform these steps on each RabbitMQ VM
    • Obtain an updated ‘ERL_SSL_PATH’
$ erl -noinput -eval 'io:format("ERL_SSL_PATH=~s~n", [filename:dirname(code:which(inet_tls_dist))])' -s init stop
    • Open the Erlang VM Environment Variables configuration file
$ nano /etc/rabbitmq/rabbitmq-env.conf
    • Replace all existing contents of rabbitmq-env.conf with these, taking care to inject the correct ERL_SSL_PATH string
    • Create the Inter-node TLS configuration file
    • Get the contents of /etc/rabbitmq/inter_node_tls.config here
$ nano /etc/rabbitmq/inter_node_tls.config

Firewall Ports

  1. Open Firewall Ports for Encrypted Communications
    • Perform these steps on each RabbitMQ VM
    • Don’t include the text like “(AMQPs)”. That is only present so you know what each line means.
$ systemctl start firewalld
$ firewall-cmd --permanent --zone=public --add-port=5671/tcp     (AMQPs)
$ firewall-cmd --permanent --zone=public --add-port=15671/tcp    (HTTP API, Management Web UI)
  1. Remove Firewall Rules for Unencrypted Communications
    • Perform these steps on each RabbitMQ VM
    • Some might fail if those ports are not already open
$ firewall-cmd --permanent --zone=public --remove-service=amqp
$ firewall-cmd --permanent --zone=public --remove-port=5672/tcp
$ firewall-cmd --permanent --zone=public --remove-port=15672/tcp

Run OS Update

  1. Run OS Update
$ yum clean packages
$ yum update
$ reboot

Testing: Round Two

  1. Verify RabbitMQ Listeners
    • Perform these steps on each RabbitMQ VM
    • Two lines should mention TLS
$ rabbitmq-diagnostics listeners
  1. Test 7: Verify AMQPS Service TLS Versions
    • Perform these steps on each RabbitMQ VM
$ openssl s_client -connect 127.0.0.1:5671 -tls1        # Should Fail
$ openssl s_client -connect 127.0.0.1:5671 -tls1_1      # Should Fail
$ openssl s_client -connect 127.0.0.1:5671 -tls1_2      # Must Succeed
$ openssl s_client -connect 127.0.0.1:5671 -tls1_3      # Might Fail, bonus if it works
  1. Test 8: Verify Cluster Inter-node Communication TLS Versions
    • Perform these steps on each RabbitMQ VM
$ openssl s_client -connect 127.0.0.1:25672 -tls1        # Should Fail
$ openssl s_client -connect 127.0.0.1:25672 -tls1_1      # Should Fail
$ openssl s_client -connect 127.0.0.1:25672 -tls1_2      # Must Succeed
$ openssl s_client -connect 127.0.0.1:25672 -tls1_3      # Might Fail, bonus if it works
  1. Test 9: Verify Management Web UI TLS Versions
    • Perform these steps on each RabbitMQ VM
$ openssl s_client -connect 127.0.0.1:15671 -tls1        # Should Fail
$ openssl s_client -connect 127.0.0.1:15671 -tls1_1      # Should Fail
$ openssl s_client -connect 127.0.0.1:15671 -tls1_2      # Must Succeed
$ openssl s_client -connect 127.0.0.1:15671 -tls1_3      # Might Fail, bonus if it works
  1. Test 10: Verify AMQPS Service TLS Versions on VIP
    • Perform these steps on any one of the three RabbitMQ VMs
$ openssl s_client -connect A.A.A.10:5671 -tls1        # Should Fail
$ openssl s_client -connect A.A.A.10:5671 -tls1_1      # Should Fail
$ openssl s_client -connect A.A.A.10:5671 -tls1_2      # Must Succeed
$ openssl s_client -connect A.A.A.10:5671 -tls1_3      # Might Fail, bonus if it works
  1. Test 11: Verify cluster Inter-node Communiction TLS Versions on VIP
    • Perform these steps on any one of the three RabbitMQ VMs
$ openssl s_client -connect A.A.A.10:25672 -tls1        # Should Fail
$ openssl s_client -connect A.A.A.10:25672 -tls1_1      # Should Fail
$ openssl s_client -connect A.A.A.10:25672 -tls1_2      # Must Succeed
$ openssl s_client -connect A.A.A.10:25672 -tls1_3      # Might Fail, bonus if it works
  1. Test 12: Verify Management Web UI TLS Versions on VIP
    • Perform these steps on any one of the three RabbitMQ VMs
$ openssl s_client -connect A.A.A.10:15672 -tls1        # Should Fail
$ openssl s_client -connect A.A.A.10:15672 -tls1_1      # Should Fail
$ openssl s_client -connect A.A.A.10:15672 -tls1_2      # Must Succeed
$ openssl s_client -connect A.A.A.10:15672 -tls1_3      # Might Fail, bonus if it works
  1. Test 13: Verify Management Web UI Access on VIP
    • From your laptop, open a web browser and navigate to https://A.A.A.10:15671
    • Log in using the localadmin account
  2. Test 14: Load Testing
    • Perform these steps on your laptop
    • Open a DOS window
    • In the DOS window, run the following command:
      • This will produce 9 producers, each sending at a rate of 20 messages/second
      • This will produce 9 consumers, each extracting at a rate of 20 messages/second
      • The command will run for 50 minutes
      • The carrots (^) allow you to break a DOS command across multiple lines
      • If you decide to put this command into a batch file (.bat), remember to double up on the percent signs: % –> %%
DOS> cd \bin
DOS> .\runjava com.rabbitmq.perf.PerfTest ^
--uris                        amqp://testuser:[email protected]:5672 ^
--time                        3000 ^
--flag                        persistent ^
--auto-delete                 false ^
--qos                         1000 ^
--confirm                     1000 ^
--confirm-timeout             -1 ^
--rate                        20 ^
--size                        1024 ^
--queue-pattern               'perf-test-%d' ^
--queue-pattern-from          1 ^
--queue-pattern-to            9 ^
--producers                   9 ^
--consumers                   9 ^
--consumer-latency            10000

Secure the Accounts

  1. Test the localadmin account
    • Perform these steps on each RabbitMQ VM
    • Connect via SSH as ‘localadmin’
    • sudo to root
      • Yes?
      • No?   Add localadmin to group ‘wheel’.
  1. Add a ‘localadmin’ user to RabbitMQ
    • Perform these steps on rabbit1
$ rabbitmqctl add_user localadmin
$ rabbitmqctl set_user_tags localadmin administrator
$ rabbitmqctl set_permissions -p / localadmin ".*" ".*" ".*"
  1. Remove Extra RabbitMQ Users
    • Perform these steps on rabbit1
$ rabbitmqctl delete_user testuser
$ rabbitmqctl delete_user guest
  1. Disable Direct Root Login
    • Perform these steps on each RabbitMQ VM
$ nano /etc/ssh/sshd_config
$ echo > /etc/securetty
$ nano /etc/pam.d/login
  • Make this the first non-commented line:

auth [user_unknown=ignore success=ok ignore=ignore default=bad] pam_securetty.so

  1. Verify Root Cannot Log In
    • Because of the changes in step 72, root should no longer be able to log in directly via SSH or at the VM consoles
    • Log off all SSH consoles
    • Attempt to log in as root via SSH. This should fail.
    • Attempt to log in as root at the VM consoles. This should fail.

Clean Up

  1. Remove the VM Snapshots
    • Navigate to vSphere –> VMs and Templates –> Find your RabbitMQ VMs
    • Actions –> Snapshots –> Delete All Snapshots
    • Repeat this for the other two VMs

Maintenance

Keepalived

Keepalived was installed using yum, so we are at the mercy of our upstream package provider to provide newer versions.  We are stuck on version 2.0.10.  Sadly, this version has a few bugs that require constant monitoring.

Bug #1: 

For no apparent reason, the keepalived service will start to consume 100% of a single CPU core.  All logging to the keepalived log file will stop.  The keepalived service will stop sending/receiving network traffic.

Bug #2:

Again, for no apparent reason, the keepalived Virtual IP (VIP) will stop responding. 

In both cases, the solution is to cycle the keepalived service on the RabbitMQ cluster member that is experiencing the problem.  We use a script to monitor both the CPU utilization by the keepalived service and if the VIP is responding.  This monitor script is run every minute as a cron job.  Both the cron command line and the script can be found in the Appendix.

Applying OS Patches

Applying OS patches is not a hard process, but it does require you look at the environment before applying any patches.

  1. NOTE: The actual VIP has been changed out for “A.A.A.10”.  If you are a CenterGrid employee, you can look up the IP addresses in the CMDB.  If you are not, then you don’t get to see our internal IP addresses.
  2. Log on to the RabbitMQ management web UI: https://A.A.A.10:15671/#/
  3. You will be taken to the Overview page
  4. Verify all three nodes are present and green

  1. If you see any yellow or red, there is a problem. Fix the problem before continuing.
  2. Click on the ‘Queues’ tab
  3. Verify all the queues have a “+2”

  1. Our cluster has three nodes; one master plus two replicas. If the “+2” is missing or different, you have a problem.  Fix the problem before continuing.
  2. Find a node that is not a master. In this example, rabbit1 is not a master.  This is where we will start the patching process.
  3. SSH into Rabbit1 and verify it is not advertising the VIP. Pretend the VIP is “A.A.A.10”.
$ ip addr show | grep “A\.A\.A\.10”

If this returns a line, it will contain the IP address of the VIP.  This means rabbit1 is hosting the VIP.  It shouldn’t be hosting the VIP because the Keepalived scripts have been sculpted to make the VIP follow the RabbitMQ cluster member that is the master for the most queues.  Restart the keepalived service on rabbit1.  Verify the VIP has moved to one of the other members before continuing.

  1. Stop the RabbitMQ service and apply patches
$ systemctl stop RabbitMQ
$ yum update
  1. Reboot this node
  2. While waiting for the reboot, go back to the RabbitMQ management web UI (https://A.A.A.10:15671/#/) and watch the cluster node go red, then yellow, then green. In our example, we worked on Rabbit1 first, but this image shows Rabbit3 in red.  Pretend it’s Rabbit1.  Hahaha…

If you were to flip over to the ‘Queues’ tab, you would see how the “+2” has changed to a “+1”.  This means that there is currently only one replica.

  1. Wait for all three nodes to be green before continuing

  1. Now it is time to pick another node to be patched. In our example, both Rabbit2 and Rabbit3 are masters of at least one queue.  RabbitMQ does not provide a method (GUI or command-line) to transfer the master role to another cluster member.  Therefore, your next step is to find which cluster member is currently hosting the VIP and pick the other one.  For example, we know Rabbit1 was already patched, so ignore it.  This leaves Rabbit2 and Rabbit3.  If Rabbit3 is hosting the VIP, then pick Rabbit2.
  2. SSH into Rabbit2 and verify it is not advertising the VIP. Pretend the VIP is “A.A.A.10”.
$ ip addr show | grep “A\.A\.A\.10”
  1. Stop the RabbitMQ service and apply patches
$ systemctl stop RabbitMQ
$ yum update

Reboot this node

  1. While waiting for the reboot, go back to the RabbitMQ management web UI (https://A.A.A.10:15671/#/) and watch the cluster node go red, then yellow, then green.
  2. Wait for all three nodes to be green before continuing

  1. SSH into the last, unpatched RabbitMQ node. The VIP will move shortly after the RabbitMQ service is stopped.
  2. Stop the RabbitMQ service and apply patches
  3. While waiting for the reboot, go back to the RabbitMQ management web UI (https://A.A.A.10:15671/#/) and watch the cluster node go red, then yellow, then green.
  4. Wait for all three nodes to be green before continuing
  5. Done!

Diagnostics

The Virtual IP (VIP)

The VIP should exist on only one of the cluster members.  Use this process to look for the VIP:

  1. SSH into each node
  2. Run the following command to check for the VIP
    • NOTE: The actual VIP has been changed out for “A.A.A.10”.  If you are a CenterGrid employee, you can look up the IP addresses in the CMDB.  If you are not, then you don’t get to see our internal IP addresses.
    • This node hosts the VIP:
$ ip addr show | grep “A\.A\.A\.10”
    Inet A.A.A.10/24 scope global secondary ens192
$
    • This node does not host the VIP:
$ ip addr show | grep “A\.A\.A\.10”
$

If the VIP exists on zero nodes, or two or more, then you have a problem.  Restart the keepalived service on all nodes, one at a time.

Keepalived

Is the process running at 100% CPU?

  1. SSH into each node
  2. Run the following command to look at CPU usage:
$ htop
  1. Look for any line where CPU is at or near 100.0, then look at the Command column to see if it is keepalived

  1. In this example, keepalived is using 0% of a CPU core, so no need to restart the service

 

What is the history of the ‘Restart Keepalived’ script?

  1. SSH into each node
  2. Navigate into the root directory and get a file listing
$ sudo su
$ cd ~
$ ls -al

  1. Inspect the historical log file
$ cat ./RestartKeepalivedIf100PercentCPU_Historical.log

  1. Watch the RecentRun log file. This file is overwritten every minute.
$ tail -f ./RestartKeepalivedIf100PercentCPU_RecentRun.log

RabbitMQ

Are the RabbitMQ cluster members healthy?

  1. Log on to the RabbitMQ management web UI: https://A.A.A.10:15671/#/
  2. You will be taken to the Overview page
  3. Verify all three nodes are present and green

  1. If you see any yellow or red, there is a problem. Fix the problem before continuing.
  2. Click on the ‘Queues’ tab
  3. Verify all the queues have a “+2”

  1. Our cluster has three nodes; one master plus two replicas. If the “+2” is missing or different, you have a problem.

More information is available at the command line and with health checks, but CenterGrid hasn’t run into any problems that require the additional detail these commands provide.

Appendix

All of the scripts and configuration files that were discussed above are included here as a plain text file.

RabbitMQ_Appendix

The Grand Finale

  1. Ferris:
    • You’re still here?
    • It’s over.
    • Go home.
    • Go.
  2. Yello:
    • -= chick chicka chickaah =-