MySQL High Availability with ProxySQL and MySQL Group Replication

As you may know, I started working with high load MySQL servers in 2010 (and then there are several entries that were noted in 2012 after I purchased back this domain :-D) and I mostly go with multi-slaves replications. These might be true in many practical cases where most of the high-traffic loads are for reading operators, not writing ones. There are several cases that I worked with multi-masters replications / MySQL cluster but I then found that my DB cluster often ran unstable, so I rarely suggest to use MySQL cluster previously. In addition, to deal with multi-servers connections, I gradually move from the application layer to the network-based layer (e.g. HA Proxy in my case). The preceding approach works, however that approach did not satisfy me since I still need much effort to monitor and manual intervention.

Today I note a new approach for MySQL HA requirement: High Availability with ProxySQL and MySQL Group Replication (which was introduced in v. 5.7.12). In case you want to check some other popular MySQL proxy solutions, come to read the Comparision at ProxySQL website first.

To start this test, I use 4 servers as follows:

SERVER_1: Install ProxySQL.
SERVER_2, SERVER_3, SERVER_4: play 3 nodes in MySQL Group Replication. Remember that we should better to NOT having an even number of nodes to avoid the split-brain issue.

Install latest MySQL Softwares to 3 DB nodes

Add MySQL APT Repository: Be noted that this step is necessary as the community version will be selected when installing MySQL instead of the default one. Otherwise, you will not see the group_replication.so file in the plugin_dir () directory.
```
wget https://dev.mysql.com/get/mysql-apt-config_0.8.10-1_all.deb | dpkg -i mysql-apt-config_*.deb
apt update -y
```
Install mysql-server package (be noted that we will install mysql-community edition, not the default one shipped with Ubuntu).
```
apt-get install -y mysql-server
```
Check to be sure the extension group_replication.so exists at /usr/lib/mysql/plugin/. As mentioned, we need this extension for the group replication functionality.

Setting up MySQL Group Replication on EACH DB SERVER

Before starting configuring MySQL configs at all DB nodes, I would like to suggest to have a clear way to organize your configuration files so that you can be sure what will happen when you edit one parameter. I personally like to organize params as follows:

By default, in Ubuntu 16.04, most configurations relating to [mysqld] section will go to /etc/mysql/mysql.conf.d/mysqld.cnf and this file is included into the main /etc/mysql/my.cnf.
I will keep the local configuration in /etc/mysql/mysql.conf.d/mysqld.cnf. These settings are normally run on 1 machine to tunning the system based on the current server configuration, and they can be used without replication.
I will add a new [mysqld] section at the end of /etc/mysql/my.cnf for MySQL replication settings. This is to be sure that settings there will be always the final one since they will rewrite any params defined in any previously included files.
I like to keep the port number for group replication is sth easy to remember, so 33060 (relates to the 3306 port of MySQL service) is the selected port.

So, let’s start the configuration work.

Generate UUID that will be used later as the group name:
```
# uuidgen
2777fc39-ef8e-458f-b8fc-af74f398331f
```
Remember this value since we will use it later.

Add the following section at the end of /etc/mysql/my.cnf for group replication settings:

[mysqld]

# General Group replication settings
gtid_mode = ON
enforce_gtid_consistency = ON
master_info_repository = TABLE
relay_log_info_repository = TABLE
binlog_checksum = NONE
log_slave_updates = ON
log_bin = binlog
binlog_format = ROW
transaction_write_set_extraction = XXHASH64
loose-group_replication_bootstrap_group = OFF
loose-group_replication_start_on_boot = OFF
loose-group_replication_ssl_mode = REQUIRED
loose-group_replication_recovery_use_ssl = 1

# TODO: Shared replication group configuration
loose-group_replication_group_name = "2777fc39-ef8e-458f-b8fc-af74f398331f"
loose-group_replication_ip_whitelist = "SERVER_2_IP, SERVER_3_IP, SERVER_4_IP"
loose-group_replication_group_seeds = "SERVER_2_IP:33060, SERVER_3_IP:33060, SERVER_4_IP:33060"

# TODO: Configurations for Single-primary (DEFAULT) or Multi-primary / multi-master mode? 
#loose-group_replication_single_primary_mode = OFF
#loose-group_replication_enforce_update_everywhere_checks = ON

# TODO: Host-specific configurations
server_id = YOUR_SERVER_NUMBER
bind-address = "YOUR_SERVER_IP"
report_host = "YOUR_SERVER_IP"
loose-group_replication_local_address = "YOUR_SERVER_IP:33060"

. Remember that you need to fill in your own information for SERVER_X_IP, YOUR_SERVER_IP, and YOUR_SERVER_NUMBER. Also, you will need to manually REJOIN the group after the service is restarted with the above setting (loose-group_replication_start_on_boot = OFF).

Restart MySQL Service:
```
systemctl restart mysql
```
Remember to open ports for MySQL service and MySQL Group Replication service.

# mysql -u root -p

mysql> SET SQL_LOG_BIN=0;
mysql> CREATE USER 'REP_USER'@'%' IDENTIFIED BY 'REP_PASSWORD' REQUIRE SSL;
mysql> GRANT REPLICATION SLAVE ON *.* TO 'rep'@'%';
mysql> FLUSH PRIVILEGES;
mysql> SET SQL_LOG_BIN=1;
mysql> CHANGE MASTER TO MASTER_USER='REP_USER', MASTER_PASSWORD='REP_PASSWORD' FOR CHANNEL 'group_replication_recovery';
mysql> INSTALL PLUGIN group_replication SONAME 'group_replication.so';

. Remember to choose your user (e.g. rep) and password (e.g. Repl@1234) in the above configuration.

Verify that the group replication plugin is on with the command SHOW PLUGINS; in MySQL console.

Start MySQL Group Replication service

To start replication, we will need to start a bootstrapped node (the first node), and then start replication services on remaining nodes.

One thing worth noting is that if you have an existing database, be sure to (i) Stop the replication service (STOP GROUP_REPLICATION), (ii) Backup and restore it on ALL nodes in the replication group before (iii) Starting the replication service.

Bootstrap the first node on SERVER_2:

Trigger the group_replication_bootstrap_group variable in MySQL console and start the replication service:

mysql> SET GLOBAL group_replication_bootstrap_group=ON;
mysql> START GROUP_REPLICATION;
mysql> SET GLOBAL group_replication_bootstrap_group=OFF;

To verify, query online nodes on MySQL console (we must see the current node with the ONLINE state there):

mysql> SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+---------------+-------------+--------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST   | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+---------------+-------------+--------------+
| group_replication_applier | b5828dd8-7476-11e8-9097-782bcb23615e | SERVER_2_IP   |        3306 | ONLINE       |
+---------------------------+--------------------------------------+---------------+-------------+--------------+
1 row in set (0.00 sec)

Start replication service on the remaining nodes (SERVER_3 and SERVER_4) in MySQL console:
```
mysql> START GROUP_REPLICATION;
```
To verify, we can also query the state of group replication in MySQL console of each node.
- We should have the ONLINE state in several seconds for all nodes.
- Otherwise, having the RECOVERING for a long time (several seconds) indicating that there is something wrong when starting the group replication service for that node. Check the log at /var/log/mysql/error.log to see what was wrong. The failed node will retry, however after several attempts (normally 10), the node will then LEAVE the group.

Modification on MySQL Group Replication

This part is an optional one. I just note here some popular cases that we often deal with MySQL Group Replication.

If you have existing data and forgot to replicate it to other nodes before first starting DB, poor you! You will need to follow these steps:
- STOP GROUP_REPLICATION; on all nodes.
- Backup database, transfer to all remaining nodes.
- On each node, in order to create a new database and import data to it, we need to
  - run SET GLOBAL super_read_only = OFF; SET SQL_LOG_BIN=0;
  - importing data
  - and then set it back to ON (SET GLOBAL super_read_only = ON; SET SQL_LOG_BIN=1;)
  - issue RESET MASTER; to reset all binary logs in current slave (in case you forgot to disable bin log before importing data) if necessary.
- Start group replication on the first node, and then start on all remaining nodes.

Some MySQL queries to check sth relating to group replication:

-- Check member status
SELECT * FROM performance_schema.replication_group_members\G

-- Check if the single primary mode is activated
SHOW VARIABLES LIKE 'group_replication_single_primary_mode';

-- find the primary node in the single-primary mode?
SELECT MEMBER_ID, MEMBER_HOST, MEMBER_PORT, MEMBER_STATE
FROM performance_schema.replication_group_members
INNER JOIN performance_schema.global_status ON (MEMBER_ID = VARIABLE_VALUE)
WHERE VARIABLE_NAME='group_replication_primary_member'\G

To add a new member to the current group, we can follow the following simple steps:
- Install and configure group replication as normally in the new node. Remember to have all node IPs in loose-group_replication_ip_whitelist and loose-group_replication_group_seeds variables in the /etc/mysql/my.cnf file.
- Start group replication in the new node, and check to be sure the new node successfully joins the cluster.
- Edit the /etc/mysql/my.cnf file in the existing nodes (SERVER_2, SERVER_3, SERVER_4) to include the new IP in loose-group_replication_ip_whitelist and loose-group_replication_group_seeds variables. We DO NOT need to restart Mysql services on these nodes.

In order for the group replication works, we also need to care about DB Schema design: explicit PK or at least 1 Not-null Unique Key (so it will be selected as PK). We can run the following query to find which table is not satisfied with this constraint:

SELECT tables.table_schema , tables.table_name , tables.engine 
FROM information_schema.tables 
LEFT JOIN ( 
   SELECT table_schema , table_name 
   FROM information_schema.statistics 
   GROUP BY table_schema, table_name, index_name HAVING 
     SUM( case when non_unique = 0 and nullable != 'YES' then 1 else 0 end ) = count(*) 
) puks 
 ON tables.table_schema = puks.table_schema and tables.table_name = puks.table_name 
 WHERE puks.table_name is null 
   AND tables.table_type = 'BASE TABLE' AND Engine="InnoDB";

Configuring ProxySQL as the DB Connection proxy

Install and start ProxySQL on SERVER_1

apt-get install -y lsb-release
wget -O - 'http://repo.proxysql.com/ProxySQL/repo_pub_key' | apt-key add -
echo deb http://repo.proxysql.com/ProxySQL/proxysql-1.4.x/$(lsb_release -sc)/ ./ | tee /etc/apt/sources.list.d/proxysql.list
apt-get update -y
apt-get install proxysql
service proxysql start

Login to the ProxySQL Admin Interface with MySQL client via the port 6032 and change the admin password (if you want):
```
mysql -u admin -padmin -h 127.0.0.1 -P6032 --prompt='Admin> '
Admin> SHOW DATABASES;
Admin> UPDATE global_variables SET variable_value='admin:password' WHERE variable_name='admin-admin_credentials';
Admin> LOAD ADMIN VARIABLES TO RUNTIME; 
Admin> SAVE ADMIN VARIABLES TO DISK;
```
As you might see, I need to copy the memory settings to the runtime realm, then save them to disk to make them persist. The reason is that updating operations here only affect the memory layer.

Add 3 MySQL nodes to the ProxySQL backends:

Admin> INSERT INTO mysql_servers(hostgroup_id, hostname, port) VALUES 
(2, 'SERVER_2_IP', 3306), (2, 'SERVER_3_IP', 3306), (2, 'SERVER_4_IP', 3306);
>Admin> SAVE MYSQL SERVERS TO DISK;
>Admin> LOAD MYSQL SERVERS TO RUNTIME;

Admin> SELECT * FROM mysql_servers;
Admin> SELECT hostgroup_id, hostname, status FROM runtime_mysql_servers;

Check the above result to be sure that all 3 nodes are in ONLINE status.

Define and insert ProxySQL HostGroup functionality, as defined at the Lefred’s blog entry:

Admin> INSERT INTO mysql_group_replication_hostgroups (writer_hostgroup,backup_writer_hostgroup,reader_hostgroup,offline_hostgroup,active,max_writers,writer_is_also_reader,max_transactions_behind) VALUES (2,4,3,1,1,1,1,0);
Admin> LOAD ADMIN VARIABLES TO RUNTIME; 
Admin> SAVE ADMIN VARIABLES TO DISK;

Create a new user for monitoring purpose in ONE of the nodes in the cluster (e.g. we log in to SERVER_2) and run the following commands in MySQL console:
```
mysql> CREATE USER 'monitor'@'%' IDENTIFIED BY 'MyMon_123';
mysql> GRANT SELECT on sys.* to 'monitor'@'%';
mysql> FLUSH PRIVILEGES;
```

Update the monitor information in ProxySQL Admin console (I also updated the MySQL server version that the ProxySQL responds to the client app):

Admin> SET mysql-monitor_username='monitor';
Admin> SET mysql-monitor_password='MyMon_123';
Admin> SET mysql-server_version='5.7.22';

Admin> SAVE MYSQL VARIABLES TO DISK;
Admin> LOAD MYSQL VARIABLES TO RUNTIME;

Create the user and password to access the app database in ONE (SERVER_2) of DB server nodes (if we do not have). E.g.
```
GRANT ALL PRIVILEGES ON app.* TO 'app_user'@'%' WITH GRANT OPTION;
SHOW GRANTS FOR 'app_user'@'%';
```
We need to extend the MySQL 5.7 sys schema with the lefred’s addition_to_sys.sql script to allow ProxySQL to monitor states of the cluster nodes. We only need to run it on the primary DB node (SERVER_2) since we are running the system in group replication:
```
wget -O addition_to_sys.sql https://gist.githubusercontent.com/lefred/77ddbde301c72535381ae7af9f968322/raw/5e40b03333a3c148b78aa348fd2cd5b5dbb36e4d/addition_to_sys.sql
mysql -u root -p < ./addition_to_sys.sql
```

Create the SAME user and password in the ProxySQL Admin console so that our application can connect to the ProxySQL with this user and password:

Admin> INSERT INTO mysql_users (username, password, active, default_hostgroup, max_connections) VALUES ('APP_USER', 'APP_PASSWORD', 1, 2, 200);
Admin> SAVE MYSQL USERS TO DISK;
Admin> LOAD MYSQL USERS TO RUNTIME;

Change the ProxySQL Listening port: By default, the application will connect to the cluster through ProxySQL using port 6032 but for most PHP applications usually, they are familiar with connecting the standard to MySQL port on 3306. So we are going to change the listening port to 3306:
```
Admin> SET mysql-interfaces='0.0.0.0:3306;/tmp/proxysql.sock';

Admin> SAVE MYSQL VARIABLES TO DISK;
```
, then we need to restart the ProxySQL service:
```
service proxysql restart
```
, then check if ProxySQL is listening on that port:
```
lsof -Pnl +M -i4 | grep proxysql
```
Nice, test to verify all is working properly by connecting the app to ProxySQL listening port.

ProxySQL Query Rules

Till now you have ProxySQL with Group Replication work properly. However, if you take a little monitoring effort on server load, you might see that all traffic comes to the writer group (2 this case). Even when you setup single-primary for MySQL Group Replication, ProxySQL does not “automatically” route read traffics to reader nodes. Routing and balancing load are the DBA job, so it needs to be done with ProxySQL query rules.

To split read and write traffic, you can check the ProxySQL Wiki entry. For example, to simply route read traffic to reader group (3) and write traffic to writer group (2), we can use the following simple regex:
```
# UPDATE mysql_users SET default_hostgroup=2; # by default with the above setup, all traffic goes to the writer group
# LOAD MYSQL USERS TO RUNTIME;
# SAVE MYSQL USERS TO DISK;

DELETE FROM mysql_query_rules;
INSERT INTO mysql_query_rules (rule_id,active,match_digest,destination_hostgroup,apply)
VALUES
(1, 1, '^SELECT.*FOR UPDATE$', 2, 1),
(2, 1, '^SELECT', 3, 1);
LOAD MYSQL QUERY RULES TO RUNTIME;
SAVE MYSQL QUERY RULES TO DISK;

SELECT * FROM stats_mysql_connection_pool; # Monitor Queries fields
```
Try put traffic to the ProxySQL listening port to see the traffic routing. Note that the above approach is “blindly” route read and write traffic to reader and writer groups. We should take more effort on monitoring and on routing traffic to achieve better load distribution among DB nodes.
Next thing to consider is Query Cache. As you may know, the MySQL query cache deprecated as of MySQL 5.7.20 and removed in MySQL 8.0. In addition, the MySQL Query cache is not really helpful in term of performance due to the way it handles locking and cache invalidation. Caching should also be done at the proxy level so that it can be reused for the system-wide. I would recommend checking the good article about ProxySQL query cache at Percona Blog.
- Guess what? I am lazy now 🙂

Some useful ProxySQL Monitoring Queries

The log file is at /var/lib/proxysql/proxysql.log

Queries relating to monitoring connection pools

#-- View current (in memory) servers
select * from mysql_servers;
#-- View current (in memory) group replication hostgroup
select * from mysql_group_replication_hostgroups;
#-- View connection pool stats
select * from stats_mysql_connection_pool;
#-- View runtime MySQL servers
select hostgroup_id, hostname, status  from runtime_mysql_servers;
#-- View query digest sort by executing time
SELECT count_star,sum_time,hostgroup,digest,digest_text FROM stats_mysql_query_digest_reset WHERE ORDER BY sum_time DESC;