Use Yandex ClickHouse for Analytics with Data from MySQL

Today I will discuss a newborn star on DB for analytics: ClickHouse by Yandex. ClickHouse is quite new when comparing to other solutions, however, there are lots of popular giants using it in production such as Yandex and Cloudflare. In this blog entry, I will note necessary steps to setup ClickHouse for Analytics with Data streaming from MySQL.

The base system comprises Ubuntu 16.04 with installed NginX, Percona Server for MySQL 5.7 with a dataset of a table of ~160M records (~ 40GB in InnoDB storage).

One point worth noting is that ClickHouse does not support DELETE / UPDATE operators. In order to overcome this, several approaches will be noted in another future blog entry.

1. Install ClickHouse

  • Add repo to source.list (I choose to use a separated file):
  • Install:
  • The config file is at /etc/clickhouse-server/config.xml. You need to check this file before starting the service. Be sure to check http_port and tcp_port to be sure they do not the same as other services. For example, the port 9000 might similar to PHP-FPM in some default configuration.
  • Start the ClickHouse service:
  • Check with clickhouse-client:
  • Congratulations, it is done and it seems to be working;-)
  • Some beginners’ commands that you might find they are useful:
    • You can use CREATE TABLE, CREATE DATABASE, SHOW TABLES, SHOW DATABASES, USE DBNAME, etc. in clickhouse client cli.
    • To quickly access a database from Linux command line, use clickhouse-client -d DBNAME.
    • Remember, till now we can only connect to CH in the local machine via SSH.
    • If you want to import data from csv, first you need to create a table in CH, and then you can use INSERT INTO tableName FORMAT CSV (or CSVWithNames if your file has a header). E.g.
    • _

2. Setup Tabix to work with ClickHouse

  1. First, we need to accept external connection in /etc/clickhouse-server/config.xml. Add the following thing to this file: <listen_host>::</listen_host> (or just modify current existing lines)
  2. Then, need to create a new user in /etc/clickhouse-server/users.xml:
    • Execute the following command to create a sha256 random password:
    • Add a new user to the file with that password:
  3. Download latest Tabix and set nginx host as per guide at Tabix.IO website:
  4. Go to your Chrome browser and log in with the connection http://YOURIP:8123 and the above mych user.

3. Stream data from MySQL

  1. First, be sure you have python 3.4 or later. In Ubuntu 16, it is mostly python3 command line: python3 -V. We will install python3-pip so that we can use pip3 command line:
  2. Install clickhouse-mysql-data-reader by using pip3 command line
    1. Install using pip3:
    2. You should now access command line clickhouse-mysql:
    3. To finish installing data reader, install service files:
  3. Configure MySQL to prepare for the connection
    1. First, insert the following parameters inside your [mysqld] section. If you are using Percona server, it is located at /etc/mysql/percona-server.conf.d/mysqld.cnf:

      . Then, restart mysql service with systemctl restart mysql.
    2. Next, add a reader user to use for clickhouse later:
  4. Finally, you can use clickhouse-mysql command line to import data from MySQL to ClickHouse server. Assume that you need to import data from MySQL database testdb in big_table_here table, we will need to (i) first create table structure in ClickHouse, then (ii) migrate current data from MySQL to ClickHouse, and (iii) Setup CH to listen to MySQL binlog to perform INSERT (only insert, no DDL or other DELETE/UPDATE actions) when there is new data in MySQL:
    1.  Create a table in ClickHouse:
      • Create the table template
      • We will then need to edit create_clickhouse_table_template.sql to sharding field and primary key (These columns must not be Nullable. Remember that the first column in MergeTree must be Date type, not DateTime or String). For example, edit and change it to
      • Next, run create table with clickhouse-client:
    2. Next, Migrate Existing Data in MySQL to ClickHouse:
    3. Finally, setup CH to listen to MySQL binlog to replicate new INSERT data:

       

 

Further References

Leave a Reply