RabbitMQ For VMware Cloud Director
Overview
CenterGrid is a big proponent of the VMware suite of products; especially vCloud Director (vCD). In fact, we use vCD to power CenterGrid Compass, our managed private cloud service. vCD uses the RabbitMQ AMQP broker to pass messages between vCD’s extension services, object extensions, and for notifications.
Despite VMware providing installation documentation here to compliment the RabbitMQ installation documentation here, coupled with enough documentation to choke a horse, we found the installation and configuration process wasn’t very straight forward. Furthermore, we have encountered a couple bugs that caused us no end of enjoyment. By the end of this article, we hope to show you how our RabbitMQ cluster was installed, configured, and operationalized for enterprise use.
As is with everything on the Internet, it becomes old fast. We used CentOS 8, but now that the entire CentOS line is being discontinued, we recommend you use something different when you build your own. Certain parts of this article will remain relevant, especially those dealing with Keepalived.
What To Expect From This Document
Fonts and Styles: Code and Console Output
In this document you will find the following styles applied to code and console output.
Linux Command Line
$ ls -l total 176 -rw-r--r--. 1 root root 683 Aug 19 09:59 0001.pcap -rw-------. 1 root root 1586 Jul 31 02:17 anaconda-ks.cfg drwxr-xr-x. 2 root root 4096 Jul 31 02:48 Desktop drwxr-xr-x. 2 root root 4096 Jul 31 02:48 Documents
Windows Command Prompt (DOS)
DOS> ipconfig /all
PowerShell
PS> $PSVersionTable Name Value ---- ----- PSVersion 5.1.18362.628 PSEdition Desktop PSCompatibleVersions {1.0, 2.0, 3.0, 4.0...} BuildVersion 10.0.18362.628 CLRVersion 4.0.30319.42000 WSManStackVersion 3.0 PSRemotingProtocolVersion 2.3 SerializationVersion 1.1.0.1
Installation
Prepare for the VMs
- Create DNS entries for these three VMs and one virtual IP (VIP)
- Log in to DNS01.domain.local
- Open the DNS snappin
- Create four new static DNS entries in domain.local
- rabbitvip = A.A.A.10
- rabbit1 = A.A.A.11
- rabbit2 = A.A.A.12
- rabbit3 = A.A.A.13
- Generate a complex password for “root” OS account to be used on all three VMs
- Generate a complex password for “localadmin” OS account to be used on all three VMs
- Generate a complex password for “vCloudDirectorAdmin” RabbitMQ account
- Generate a complex password for “monitoringuser”; to be used by your monitoring software
- Download the CentOS 8 minimal installer ISO
- Upload the CentOS 8 minimal installer ISO to a datastore in your vCD environment
Build The VMs
- Using vSphere (not vCD), create three VMs with these settings:
Setting | Value |
Compatibility | ESXi 6.7 Update 2 and later |
Guest OS | Linux, CentOS 8 (64-bit) |
CPU | 2 |
RAM | 4 |
Disk | 50 GiB, Thin-provisioned |
SCSI | VMware Paravirtual |
Network | Your choice |
CD/DVD | The CentOS 8 minimal installer ISO from steps #7 & 8 |
Video | Auto-detect settings |
Tools Upgrades | Enabled |
- Steps 11 to 15 must be performed on all three VMs
- During the installation, you will create the root account. Also take the time to add the “localadmin” account and assign it to the “wheel” group.
- Wait for the installation to finish
- Log in as root
- Disable system protections: Firewall & Security-Enhanced Linux (SELinux). These will be re-enabled at a later step.
$ systemctl stop firewalld $ setenforce 0
- Update OS and install prerequisite software
$ yum install -y epel-release $ yum update -y $ yum install open-vm-tools keepalived tcpdump erlang wget htop -y
Install and Configure RabbitMQ
- Steps 17 to 20 must be performed on all three VMs
- Download and install the RabbitMQ package
$ nano /etc/yum.repos.d/rabbitmq_erlang.repo
$ wget https://github.com/rabbitmq/rabbitmq-server/releases/download/v3.8.5/rabbitmq-server-3.8.5-1.el8.noarch.rpm $ rpm -import https://www.rabbitmq.com/rabbitmq-signing-key-public.asc $ yum install -y ./rabbitmq-server-3.8.5-1.el8.noarch.rpm
- Edit /etc/rabbitmq/rabbitmq.conf
- Edit /etc/rabbitmq/rabbitmq-env.conf
- Get the contents of /etc/rabbitmq/rabbitmq-env.conf here
- Enable and start the rabbitmq-server service
$ chkconfig rabbitmq-server on $ systemctl start rabbitmq-server $ rabbitmq-plugins enable rabbitmq_management
- Do this on VM rabbit1:
- The last command will output a string. Copy that string and keep it for step 22.
$ rabbitmqctl set_cluster_name [email protected] $ rabbitmqctl set_policy ha-all ".*" '{"ha-mode":"all","ha-sync-mode":"automatic","ha-promote-on-failure":"when-synced","ha-promote-on-shutdown":"when-synced","queue-mode":"lazy"}' $ cat /var/lib/rabbitmq/.erlang.cookie
- Do this on VMs rabbit2 and 3:
- Take the value from Step 21 and insert it into below
$ echo '' > /var/lib/rabbitmq/.erlang.cookie $ rabbitmqctl stop_app $ rabbitmqctl reset $ rabbitmqctl start_app
- After the last command executes, VMs 2 and 3 should join node 1 and form a cluster
Install and Configure Keepalived
- Steps 24 to 28 must be performed on all three VMs
- Find your NIC’s identifier. It might look like “ens19” or “eno1” or “enp3s0f0”. Remember this value for the next step.
- Perform these steps on rabbit1
$ ip addr show
- Edit /etc/keepalived/keepalived.conf
- Perform these steps on rabbit1
- In this file, change the current NIC name from “ens18” to the correct NIC name
- Get the contents of /etc/keepalived/keepalived.conf here
$ nano /etc/keepalived/keepalived.conf
- Place the “keepalived-rabbitmq-healthcheck” scripts into /usr/libexec/keepalived/
- Execute each nano command below, then paste in the contents
- Get the content of each file in the Appendix
- Save the file
- Go to the next file in the list
$ nano /usr/libexec/keepalived/keepalived-rabbitmq-health-check-cluster.sh $ nano /usr/libexec/keepalived/keepalived-rabbitmq-health-check-port-5671.sh $ nano /usr/libexec/keepalived/keepalived-rabbitmq-health-check-port-5672.sh $ nano /usr/libexec/keepalived/keepalived-rabbitmq-health-check-port-15672.sh $ nano /usr/libexec/keepalived/keepalived-rabbitmq-health-check-port-25672.sh $ nano /usr/libexec/keepalived/keepalived-rabbitmq-health-check-queue-master.sh
- NOTES:
- Keepalived will run the scripts as root because the OS user ‘keepalived_script’ does not exist. All rabbitmqctl and rabbitmq-diagnostic commands must be run as root, so there is no need for user ‘keepalived_script’.
- For more information on why “/usr/libexec/keepalived” was chosen to host the scripts, please read this article
- Each script only checks one port. This was by design to make the logs easier to read; now, when a health check script fails, you know exactly which port is to blame.
- NOTES:
- Make these scripts executable by root
- Perform these steps on each RabbitMQ VM
$ chmod 0744 /usr/libexec/keepalived/keepalived-rabbitmq-health-check-cluster.sh $ chmod 0744 /usr/libexec/keepalived/keepalived-rabbitmq-health-check-port-5671.sh $ chmod 0744 /usr/libexec/keepalived/keepalived-rabbitmq-health-check-port-5672.sh $ chmod 0744 /usr/libexec/keepalived/keepalived-rabbitmq-health-check-port-15672.sh $ chmod 0744 /usr/libexec/keepalived/keepalived-rabbitmq-health-check-port-25672.sh $ chmod 0744 /usr/libexec/keepalived/keepalived-rabbitmq-health-check-queue-master.sh
- Enable and start the keepalived service
- Perform these steps on each RabbitMQ VM
$ chkconfig keepalived on $ systemctl start keepalived
Create User Accounts
- The Monitoring User
- Perform these steps on each RabbitMQ VM
- All of our servers are monitored. Skip this step if it does not apply.
- Exists at the OS level only. Has zero rights inside RabbitMQ.
$ useradd monitoringuser $ passwd monitoringuser $ usermod -aG wheel monitoringuser
- The vCloud Director User
- Perform these steps on rabbit1
- Exists inside RabbitMQ only. Has zero rights at the OS level.
$ rabbitmqctl add_user vCloudDirectorAdmin $ rabbitmqctl set_user_tags vCloudDirectorAdmin administrator $ rabbitmqctl set_permissions -p / vCloudDirectorAdmin ".*" ".*" ".*"
- The Test User
- Perform these steps on rabbit1
- Exists inside RabbitMQ only. Has zero rights at the OS level.
- This account has a weak password (12345678), but its simplicity makes it easy to construct the command line string when stress testing the RabbitMQ cluster
- This account will be removed by the end of this document
$ rabbitmqctl add_user testuser 12345678 $ rabbitmqctl set_user_tags testuser administrator $ rabbitmqctl set_permissions -p / testuser ".*" ".*" ".*"
Finish Installation and Configuration
- Enable system protections: Firewall & Security-Enhanced Linux (SELinux)
- Perform these steps on each RabbitMQ VM
$ systemctl start firewalld $ setenforce 1
- Add RabbitMQ firewall rules
- Perform these steps on each RabbitMQ VM
- This ruleset allows the RabbitMQ VMs to talk unrestricted to each other, but permits only AMQP and Management UI communications with outside clients
$ firewall-cmd --permanent --zone=public --add-rich-rule='rule family="ipv4" source address="A.A.A.10/32" port protocol="tcp" port="4000-65535" accept' $ firewall-cmd --permanent --zone=public --add-rich-rule='rule family="ipv4" source address="A.A.A.11/32" port protocol="tcp" port="4000-65535" accept' $ firewall-cmd --permanent --zone=public --add-rich-rule='rule family="ipv4" source address="A.A.A.12/32" port protocol="tcp" port="4000-65535" accept' $ firewall-cmd --permanent --zone=public --add-rich-rule='rule family="ipv4" source address="A.A.A.13/32" port protocol="tcp" port="4000-65535" accept' $ firewall-cmd --permanent --zone=public --add-rich-rule='rule family="ipv4" source address="0.0.0.0/0" port protocol="tcp" port="15672" accept' $ firewall-cmd --permanent --zone=public --add-rich-rule='rule family="ipv4" source address="0.0.0.0/0" port protocol="tcp" port="5672" accept' $ firewall-cmd –reload
- Add Keepalived firewall rules
- Perform these steps on each RabbitMQ VM
- Create a new firewall service definition file
- Get the contents of /etc/firewalld/services/vrrp.xml here
$ nano /etc/firewalld/services/vrrp.xml
- Add that service definition to the list of active rules
$ firewall-cmd --reload $ firewall-cmd --permanent --zone=public --add-service=vrrp $ firewall-cmd –-reload
- Install your antivirus client
- Install your monitoring agent
- At this point, keepalived should have placed the virtual IP on a single RabbitMQ VM, and the three VMs should have formed a cluster. However, if you are super paranoid, now would be a good time to reboot all three VMs.
Testing: Round One
- Test 1: Does the Keepalived Virtual IP address exist on only one RabbitMQ VM?
- Open three SSH windows, one to each RabbitMQ VM
- A.A.11
- A.A.12
- A.A.13
- In each SSH window, type and run this:
- Open three SSH windows, one to each RabbitMQ VM
$ watch 'ip addr show | grep “A.A.A.10”'
- Arrange all three SSH windows so you can see them simultaneously
- Only one of the three RabbitMQ VMs should give a response that looks like this:
“inet A.A.A.10/24 scope global secondary ens192”
- If more than one window shows the virtual IP address, then your keepalived configuration is incorrect. Solve this problem before moving on.
- Test 2: Does the Keepalived virtual IP address move from VM to VM? (keepalived service restarted)
- Open three additional SSH windows, one to each RabbitMQ VM
- A.A.11
- A.A.12
- A.A.13
- Find the RabbitMQ VM that is currently hosting the VIP (A.A.A.10)
- Run this command in the new SSH window of the RabbitMQ VM that is hosting the VIP:
$ systemctl restart keepalived
- Watch the monitoring SSH windows to ensure that that the VIP moves from server to server as the keepalived service is restarted
- During the transition period, you might see two servers hosting the VIP, but after a few seconds only one of the three RabbitMQ VMs should give a response that looks like this:
“inet A.A.A.10/24 scope global secondary ens192”
- Test 3: Does the Keepalived Virtual IP address move from VM to VM? (rabbitmq-server restarted)
- Find the RabbitMQ VM that is currently hosting the VIP (A.A.A.10)
- Run this command in the new SSH window of the RabbitMQ VM that is hosting the VIP:
$ systemctl restart rabbitmq-server
- Watch the monitoring SSH windows to ensure that that the VIP moves from server to server as the keepalived service is restarted
- During the transition period, you might see two servers hosting the VIP, but after a few seconds only one of the three RabbitMQ VMs should give a response that looks like this:
“inet A.A.A.10/24 scope global secondary ens192”
- Test 4: Does the RabbitMQ VM rejoin the cluster? (rabbitmq-server restarted)
- Perform these steps on your laptop
- Open the RabbitMQ Management UI
http://A.A.A.10:15672
- Navigate to Overview –> Nodes
- Run this command in the new SSH window of the RabbitMQ VM that is hosting the VIP:
$ systemctl restart rabbitmq-server
- Watch the web page for a node to go RED –> YELLOW –> GREEN
- Wait for all three nodes to appear green before continuing
- What you have just witnessed is a node being kicked out of the cluster and rejoining the cluster
- Also, .10 is the VIP, so the keepalived services should be moving that IP address around, too
- There should not be any lengthy service interruptions from the web page
- Test 5: Does the RabbitMQ VM rejoin the cluster? (reboots)
- Perform these steps on your laptop
- Open the RabbitMQ Management UI
http://A.A.A.10:15672
- Navigate to Overview –> Nodes
- Reboot a node
- Watch the web page for a node to go RED –> YELLOW –> GREEN
- Wait for all three nodes to appear green before continuing
- What you have just witnessed is a node being kicked out of the cluster and rejoining the cluster
- Also, .10 is the VIP, so the keepalived services should be moving that IP address around, too
- There should not be any lengthy service interruptions from the web page
- Test 6: Load Testing
- Perform these steps on your laptop
- Download and install the Java Runtime Environment (JRE) on your laptop
- Download and install the RabbitMQ Performance Test application on your laptop:
https://github.com/rabbitmq/rabbitmq-perf-test/releases
https://rabbitmq.github.io/rabbitmq-perf-test/stable/htmlsingle/#installation
- Open a DOS window
- In the DOS window, run the following command:
- This will produce 9 producers, each sending at a rate of 20 messages/second
- This will produce 9 consumers, each extracting at a rate of 20 messages/second
- The command will run for 50 minutes
- The carrots (^) allow you to break a DOS command across multiple lines
- If you decide to put this command into a batch file (.bat), remember to double up on the percent signs: % –> %%
DOS> cd \bin DOS> .\runjava com.rabbitmq.perf.PerfTest ^ --uris amqp://testuser:[email protected]:5672 ^ --time 3000 ^ --flag persistent ^ --auto-delete false ^ --qos 1000 ^ --confirm 1000 ^ --confirm-timeout -1 ^ --rate 20 ^ --size 1024 ^ --queue-pattern 'perf-test-%d' ^ --queue-pattern-from 1 ^ --queue-pattern-to 9 ^ --producers 9 ^ --consumers 9 ^ --consumer-latency 10000
- While this command is running, revisit Tests 1-5 to explore the HA/Failover capabilities under load
- While this command is running, start a SSH window to each of the three RabbitMQ VMs and run ‘htop’
- Observe the CPU and memory utilization
- Maxed out CPU or RAM is bad
- To try a different rate of message delivery and consumption…
- Stop the DOS command
- Choose a new rate, or ‘producers’ count, or ‘consumers’ count
- Select and copy the whole command
- Paste it into the DOS window as one whole string
- Press ‘enter’ if necessary
- Minimize your testing windows. You’re going to use them again at the end of this document.
Configure HA in vSphere
- Create an anti-affinity rule
- Perform these steps on your laptop
- Log in to vSphere
- Navigate to ‘Hosts and Clusters’ –>
- Navigate to Configure tab –> Configuration –> ‘VM/Host Rules’ –> +Add
- Name: anti-affinity-rule-RabbitMQ_Cluster
- Type: Separate Virtual Machines
- VMs:
- rabbit1
- rabbit2
- rabbit3
- Click OK to close and save
- In the next step, we will completely shutdown the VM to take a snapshot. This will allow vSphere to place the VM on a different host to meet the anti-affinity rules.
- Take VM Snapshots
- Perform these steps on each RabbitMQ VM
- Log in via SSH (root)
- Completely shut down the VM
$ shutdown --poweroff 0
- From your laptop, navigate to vSphere –> VMs and Templates –> –> One of the RabbitMQ VMs
- Actions –> Snapshots –> Take Snapshot
- Name: <Change_Request_#_Here>
- Description: Snapshot before <Change_Request_#_Here>. Rollback point.
- Repeat this for the other two VMs
- Start the VMs
Create an SSL/TLS Certificate for the Cluster
NOTE: We found generating the certificate to be rather tricky and made a mistake or two early on. As a result, some of the inter-node authentications/verifications did not work as described in the RabbitMQ documentation. Compromises were therefore made inside the inter_node_tls.config file. If we were to go back and redeploy a RabbitMQ cluster using the certificate instructions contained here, then the compromises in the inter_node_tls.config file would not be needed.
- Generate a Certificate Signing Request (CSR)
- Perform these steps on rabbit1
- Log in to rabbit1 via SSH (root)
- Edit OpenSSL.cnf configuration file before generating CSR
$ nano /etc/pki/tls/openssl.cnf
- Find the following lines in openssl.cnf and change them
FROM: # req_extensions = v3_req # The extensions to add to a certificate request
TO: req_extensions = v3_req # The extensions to add to a certificate request
FROM: countryName_default = XX
TO: countryName_default = US
FROM: #stateOrProvinceName_default = Default Province
TO: stateOrProvinceName_default =
FROM: localityName_default = Default City
TO: localityName_default =
FROM: 0.organizationName_default = Default Company Ltd
TO: 0.organizationName_default =
FIND: commonName_max = 64
TO: commonName_max = 64
TO: commonName_default = rabbitvip.domain.local
FIND: keyUsage = nonRepudiation, digitalSignature, keyEncipherment
TO: keyUsage = nonRepudiation, digitalSignature, keyEncipherment
TO: subjectAltName = @alt_names
TO:
TO: [ alt_names ]
TO: DNS.1 = rabbitvip.domain.local
TO: DNS.2 = rabbitvip
TO: DNS.3 = rabbit1.domain.local
TO: DNS.4 = rabbit1
TO: DNS.5 = rabbit2.domain.local
TO: DNS.6 = rabbit2
TO: DNS.7 = rabbit3.domain.local
TO: DNS.8 = rabbit3
TO: IP.1 = A.A.A.10
TO: IP.2 = A.A.A.11
TO: IP.3 = A.A.A.12
TO: IP.4 = A.A.A.13
- Save and close OpenSSL.cnf
- Generate a CSR
- Do not provide a password when requested
$ mkdir ~/csr $ cd ~/csr $ openssl req -new -newkey rsa:2048 -keyout rabbitmqv2.key -out rabbitmqv2.csr -nodes -config /etc/pki/tls/openssl.cnf -extensions v3_req
- Prepare RabbitMQ Certificate Template
- Log in to your Windows domain.local Certificate Authority
- Open MMC.exe –> Add ‘Certification Authority’ snap-in
- Expand the tree until you see ‘Certificate Templates’ –> Right-click –> Manage
- Find template ‘Computer’ –> Right-click –> Duplicate Template –> New window appears
- Edit tabs:
- Tab: General
- Template Display Name: “RabbitMQ Cluster”
- Validity Period: “3 years”
- Tab: Cryptography
- Minimum Key Size: 2048
- Providers: Microsoft RSA SChannel Cryptographic Provider
- Tab: Extensions
- Application Policies –> Edit –> ‘Client Authentication’ and ‘Server Authentication’
- Key Usage –> Edit –> ‘Digital Signature’ and ‘Key Encipherment’ and ‘Allow Encryption of User Data’
- Tab: Subject Name
- Select ‘Supply in the request’
- Click ‘OK’
- Tab: General
- Close ‘Certificate Templates Console’
- Go back to the ‘Certification Authority’ snap-in
- ‘Certificate Templates’ –> Right-click –> New –> ‘Certificate Templates to Issue’
- Scroll down and find ‘RabbitMQ Cluster’ –> OK
- Open PowerShell window
PS> Get-CATemplate
- Verify one of the entries is ‘RabbitMQCluster’
- Upload the CSR to the Windows Certificate Authority
- Connect to rabbit1 using WinSCP (root)
- Download the CSR from /root/csr/rabbitmq.csr
- Log in to the Certificate Authority (RDP)
- Copy the CSR to the CA via RDP drag ‘n drop copy
- Create a Certificate from the CSR
- On the Windows CA, open a DOS window
DOS> certreq -submit -attrib "CertificateTemplate:RabbitMQCluster"
- The app will ask you to find and select the CSR file
- The app will generate a certificate called ‘RabbitMQCluster.cer’
- Shift-delete the CSR file
- Export the Root CA Public Certificate
- On the Windows CA, open a DOS window
DOS> certutil -ca.cert domain.local.CA.cer
- Copy Certificates from Windows CA to Rabbit1
- Copy the following certificate from the CA to your laptop
- cer
- local.CA.cer
- Copy the following certificates from your laptop to Laptop to rabbit1 using WinSCP (/root/csr)
- cer
- local.CA.cer
- On rabbit1, move certificate related files
- Copy the following certificate from the CA to your laptop
$ mv /root/csr/* /etc/pki/tls/private/ $ cd /etc/pki/tls/private
- Translate the Certificates to PEM Format
- Perform these steps on rabbit1
- Convert RabbitMQCluster.cer to PEM format
- Convert Certificate Authority chain to PEM format
- Copy PEM certificates to new location
$ openssl x509 -in RabbitMQCluster.cer -outform PEM -out rabbitmqcluster.pem $ openssl x509 -inform DER -outform PEM -in domain.local.CA.cer -out domain.local.CA.cer.pem $ cp ./rabbitmqcluster.pem /etc/pki/tls/certs/ $ cp ./rabbitmq.key /etc/pki/tls/certs/ $ cp ./domain.local.CA.cer.pem /etc/pki/tls/certs/ $ chmod 444 /etc/pki/tls/certs/*
- Copy the Certificate Files to the Other RabbitMQ VMs
- Copy the following files from rabbit1 to Laptop using WinSCP
- Source: /etc/pki/tls/certs/rabbitmqcluster.pem
- Source: /etc/pki/tls/certs/rabbitmq.key
- Source: /etc/pki/tls/certs/domain.local.CA.cer
- Copy the following files from Laptop to rabbit2 using WinSCP
- Destination: /etc/pki/tls/certs/rabbitmqcluster.pem
- Destination: /etc/pki/tls/certs/rabbitmq.key
- Destination: /etc/pki/tls/certs/domain.local.CA.cer
- Copy the following files from Laptop to rabbit3 using WinSCP
- Destination: /etc/pki/tls/certs/rabbitmqcluster.pem
- Destination: /etc/pki/tls/certs/rabbitmq.key
- Destination: /etc/pki/tls/certs/domain.local.CA.cer
- Set permissions on the files on rabbit2 and rabbit3
- Copy the following files from rabbit1 to Laptop using WinSCP
$ chmod 444 /etc/pki/tls/certs/*
Enable Encrypted Communications
- Update RabbitMQ Configuration File to Only Use Encrypted Communications
- Perform these steps on each RabbitMQ VM
- Replace the contents of the RabbitMQ Configuration File with these
$ nano /etc/rabbitmq/rabbitmq.conf
- Save and close the file
- Update Erlang VM Configuration to Use Encrypted Communications
- Perform these steps on each RabbitMQ VM
- Obtain an updated ‘ERL_SSL_PATH’
$ erl -noinput -eval 'io:format("ERL_SSL_PATH=~s~n", [filename:dirname(code:which(inet_tls_dist))])' -s init stop
- Open the Erlang VM Environment Variables configuration file
$ nano /etc/rabbitmq/rabbitmq-env.conf
$ nano /etc/rabbitmq/inter_node_tls.config
Firewall Ports
- Open Firewall Ports for Encrypted Communications
- Perform these steps on each RabbitMQ VM
- Don’t include the text like “(AMQPs)”. That is only present so you know what each line means.
$ systemctl start firewalld $ firewall-cmd --permanent --zone=public --add-port=5671/tcp (AMQPs) $ firewall-cmd --permanent --zone=public --add-port=15671/tcp (HTTP API, Management Web UI)
- Remove Firewall Rules for Unencrypted Communications
- Perform these steps on each RabbitMQ VM
- Some might fail if those ports are not already open
$ firewall-cmd --permanent --zone=public --remove-service=amqp $ firewall-cmd --permanent --zone=public --remove-port=5672/tcp $ firewall-cmd --permanent --zone=public --remove-port=15672/tcp
Run OS Update
- Run OS Update
$ yum clean packages $ yum update $ reboot
Testing: Round Two
- Verify RabbitMQ Listeners
- Perform these steps on each RabbitMQ VM
- Two lines should mention TLS
$ rabbitmq-diagnostics listeners
- Test 7: Verify AMQPS Service TLS Versions
- Perform these steps on each RabbitMQ VM
$ openssl s_client -connect 127.0.0.1:5671 -tls1 # Should Fail $ openssl s_client -connect 127.0.0.1:5671 -tls1_1 # Should Fail $ openssl s_client -connect 127.0.0.1:5671 -tls1_2 # Must Succeed $ openssl s_client -connect 127.0.0.1:5671 -tls1_3 # Might Fail, bonus if it works
- Test 8: Verify Cluster Inter-node Communication TLS Versions
- Perform these steps on each RabbitMQ VM
$ openssl s_client -connect 127.0.0.1:25672 -tls1 # Should Fail $ openssl s_client -connect 127.0.0.1:25672 -tls1_1 # Should Fail $ openssl s_client -connect 127.0.0.1:25672 -tls1_2 # Must Succeed $ openssl s_client -connect 127.0.0.1:25672 -tls1_3 # Might Fail, bonus if it works
- Test 9: Verify Management Web UI TLS Versions
- Perform these steps on each RabbitMQ VM
$ openssl s_client -connect 127.0.0.1:15671 -tls1 # Should Fail $ openssl s_client -connect 127.0.0.1:15671 -tls1_1 # Should Fail $ openssl s_client -connect 127.0.0.1:15671 -tls1_2 # Must Succeed $ openssl s_client -connect 127.0.0.1:15671 -tls1_3 # Might Fail, bonus if it works
- Test 10: Verify AMQPS Service TLS Versions on VIP
- Perform these steps on any one of the three RabbitMQ VMs
$ openssl s_client -connect A.A.A.10:5671 -tls1 # Should Fail $ openssl s_client -connect A.A.A.10:5671 -tls1_1 # Should Fail $ openssl s_client -connect A.A.A.10:5671 -tls1_2 # Must Succeed $ openssl s_client -connect A.A.A.10:5671 -tls1_3 # Might Fail, bonus if it works
- Test 11: Verify cluster Inter-node Communiction TLS Versions on VIP
- Perform these steps on any one of the three RabbitMQ VMs
$ openssl s_client -connect A.A.A.10:25672 -tls1 # Should Fail $ openssl s_client -connect A.A.A.10:25672 -tls1_1 # Should Fail $ openssl s_client -connect A.A.A.10:25672 -tls1_2 # Must Succeed $ openssl s_client -connect A.A.A.10:25672 -tls1_3 # Might Fail, bonus if it works
- Test 12: Verify Management Web UI TLS Versions on VIP
- Perform these steps on any one of the three RabbitMQ VMs
$ openssl s_client -connect A.A.A.10:15672 -tls1 # Should Fail $ openssl s_client -connect A.A.A.10:15672 -tls1_1 # Should Fail $ openssl s_client -connect A.A.A.10:15672 -tls1_2 # Must Succeed $ openssl s_client -connect A.A.A.10:15672 -tls1_3 # Might Fail, bonus if it works
- Test 13: Verify Management Web UI Access on VIP
- From your laptop, open a web browser and navigate to https://A.A.A.10:15671
- Log in using the localadmin account
- Test 14: Load Testing
- Perform these steps on your laptop
- Open a DOS window
- In the DOS window, run the following command:
- This will produce 9 producers, each sending at a rate of 20 messages/second
- This will produce 9 consumers, each extracting at a rate of 20 messages/second
- The command will run for 50 minutes
- The carrots (^) allow you to break a DOS command across multiple lines
- If you decide to put this command into a batch file (.bat), remember to double up on the percent signs: % –> %%
DOS> cd \bin DOS> .\runjava com.rabbitmq.perf.PerfTest ^ --uris amqp://testuser:[email protected]:5672 ^ --time 3000 ^ --flag persistent ^ --auto-delete false ^ --qos 1000 ^ --confirm 1000 ^ --confirm-timeout -1 ^ --rate 20 ^ --size 1024 ^ --queue-pattern 'perf-test-%d' ^ --queue-pattern-from 1 ^ --queue-pattern-to 9 ^ --producers 9 ^ --consumers 9 ^ --consumer-latency 10000
Secure the Accounts
- Test the localadmin account
- Perform these steps on each RabbitMQ VM
- Connect via SSH as ‘localadmin’
- sudo to root
- Yes?
- No? Add localadmin to group ‘wheel’.
- Add a ‘localadmin’ user to RabbitMQ
- Perform these steps on rabbit1
$ rabbitmqctl add_user localadmin $ rabbitmqctl set_user_tags localadmin administrator $ rabbitmqctl set_permissions -p / localadmin ".*" ".*" ".*"
- Remove Extra RabbitMQ Users
- Perform these steps on rabbit1
$ rabbitmqctl delete_user testuser $ rabbitmqctl delete_user guest
- Disable Direct Root Login
- Perform these steps on each RabbitMQ VM
$ nano /etc/ssh/sshd_config
- Find the line containing “PermitRootLogin” and set it to “PermitRootLogin no”
- Close and save the file
- Disable root login via TTY
$ echo > /etc/securetty $ nano /etc/pam.d/login
- Make this the first non-commented line:
auth [user_unknown=ignore success=ok ignore=ignore default=bad] pam_securetty.so
- Verify Root Cannot Log In
- Because of the changes in step 72, root should no longer be able to log in directly via SSH or at the VM consoles
- Log off all SSH consoles
- Attempt to log in as root via SSH. This should fail.
- Attempt to log in as root at the VM consoles. This should fail.
Clean Up
- Remove the VM Snapshots
- Navigate to vSphere –> VMs and Templates –> Find your RabbitMQ VMs
- Actions –> Snapshots –> Delete All Snapshots
- Repeat this for the other two VMs
Maintenance
Keepalived
Keepalived was installed using yum, so we are at the mercy of our upstream package provider to provide newer versions. We are stuck on version 2.0.10. Sadly, this version has a few bugs that require constant monitoring.
Bug #1:
For no apparent reason, the keepalived service will start to consume 100% of a single CPU core. All logging to the keepalived log file will stop. The keepalived service will stop sending/receiving network traffic.
Bug #2:
Again, for no apparent reason, the keepalived Virtual IP (VIP) will stop responding.
In both cases, the solution is to cycle the keepalived service on the RabbitMQ cluster member that is experiencing the problem. We use a script to monitor both the CPU utilization by the keepalived service and if the VIP is responding. This monitor script is run every minute as a cron job. Both the cron command line and the script can be found in the Appendix.
Applying OS Patches
Applying OS patches is not a hard process, but it does require you look at the environment before applying any patches.
- NOTE: The actual VIP has been changed out for “A.A.A.10”. If you are a CenterGrid employee, you can look up the IP addresses in the CMDB. If you are not, then you don’t get to see our internal IP addresses.
- Log on to the RabbitMQ management web UI: https://A.A.A.10:15671/#/
- You will be taken to the Overview page
- Verify all three nodes are present and green
- If you see any yellow or red, there is a problem. Fix the problem before continuing.
- Click on the ‘Queues’ tab
- Verify all the queues have a “+2”
- Our cluster has three nodes; one master plus two replicas. If the “+2” is missing or different, you have a problem. Fix the problem before continuing.
- Find a node that is not a master. In this example, rabbit1 is not a master. This is where we will start the patching process.
- SSH into Rabbit1 and verify it is not advertising the VIP. Pretend the VIP is “A.A.A.10”.
$ ip addr show | grep “A\.A\.A\.10”
If this returns a line, it will contain the IP address of the VIP. This means rabbit1 is hosting the VIP. It shouldn’t be hosting the VIP because the Keepalived scripts have been sculpted to make the VIP follow the RabbitMQ cluster member that is the master for the most queues. Restart the keepalived service on rabbit1. Verify the VIP has moved to one of the other members before continuing.
- Stop the RabbitMQ service and apply patches
$ systemctl stop RabbitMQ $ yum update
- Reboot this node
- While waiting for the reboot, go back to the RabbitMQ management web UI (https://A.A.A.10:15671/#/) and watch the cluster node go red, then yellow, then green. In our example, we worked on Rabbit1 first, but this image shows Rabbit3 in red. Pretend it’s Rabbit1. Hahaha…
If you were to flip over to the ‘Queues’ tab, you would see how the “+2” has changed to a “+1”. This means that there is currently only one replica.
- Wait for all three nodes to be green before continuing
- Now it is time to pick another node to be patched. In our example, both Rabbit2 and Rabbit3 are masters of at least one queue. RabbitMQ does not provide a method (GUI or command-line) to transfer the master role to another cluster member. Therefore, your next step is to find which cluster member is currently hosting the VIP and pick the other one. For example, we know Rabbit1 was already patched, so ignore it. This leaves Rabbit2 and Rabbit3. If Rabbit3 is hosting the VIP, then pick Rabbit2.
- SSH into Rabbit2 and verify it is not advertising the VIP. Pretend the VIP is “A.A.A.10”.
$ ip addr show | grep “A\.A\.A\.10”
- Stop the RabbitMQ service and apply patches
$ systemctl stop RabbitMQ $ yum update
Reboot this node
- While waiting for the reboot, go back to the RabbitMQ management web UI (https://A.A.A.10:15671/#/) and watch the cluster node go red, then yellow, then green.
- Wait for all three nodes to be green before continuing
- SSH into the last, unpatched RabbitMQ node. The VIP will move shortly after the RabbitMQ service is stopped.
- Stop the RabbitMQ service and apply patches
- While waiting for the reboot, go back to the RabbitMQ management web UI (https://A.A.A.10:15671/#/) and watch the cluster node go red, then yellow, then green.
- Wait for all three nodes to be green before continuing
- Done!
Diagnostics
The Virtual IP (VIP)
The VIP should exist on only one of the cluster members. Use this process to look for the VIP:
- SSH into each node
- Run the following command to check for the VIP
- NOTE: The actual VIP has been changed out for “A.A.A.10”. If you are a CenterGrid employee, you can look up the IP addresses in the CMDB. If you are not, then you don’t get to see our internal IP addresses.
- This node hosts the VIP:
$ ip addr show | grep “A\.A\.A\.10” Inet A.A.A.10/24 scope global secondary ens192 $
- This node does not host the VIP:
$ ip addr show | grep “A\.A\.A\.10” $
If the VIP exists on zero nodes, or two or more, then you have a problem. Restart the keepalived service on all nodes, one at a time.
Keepalived
Is the process running at 100% CPU?
- SSH into each node
- Run the following command to look at CPU usage:
$ htop
- Look for any line where CPU is at or near 100.0, then look at the Command column to see if it is keepalived
- In this example, keepalived is using 0% of a CPU core, so no need to restart the service
What is the history of the ‘Restart Keepalived’ script?
- SSH into each node
- Navigate into the root directory and get a file listing
$ sudo su $ cd ~ $ ls -al
- Inspect the historical log file
$ cat ./RestartKeepalivedIf100PercentCPU_Historical.log
- Watch the RecentRun log file. This file is overwritten every minute.
$ tail -f ./RestartKeepalivedIf100PercentCPU_RecentRun.log
RabbitMQ
Are the RabbitMQ cluster members healthy?
- Log on to the RabbitMQ management web UI: https://A.A.A.10:15671/#/
- You will be taken to the Overview page
- Verify all three nodes are present and green
- If you see any yellow or red, there is a problem. Fix the problem before continuing.
- Click on the ‘Queues’ tab
- Verify all the queues have a “+2”
- Our cluster has three nodes; one master plus two replicas. If the “+2” is missing or different, you have a problem.
More information is available at the command line and with health checks, but CenterGrid hasn’t run into any problems that require the additional detail these commands provide.
Appendix
All of the scripts and configuration files that were discussed above are included here as a plain text file.
The Grand Finale
- Ferris:
- You’re still here?
- It’s over.
- Go home.
- Go.
- Yello:
- -= chick chicka chickaah =-