Setting Up A ClickHouse Database

What is ClickHouse

ClickHouse is a relatively new player in the storage space. It is an open-source high-performance, SQL database for online and analytical processing (OLAP). Online analytical processing tools allow businesses to store, analyse and query large datasets.

ClickHouse, unlike most SQL database systems is Column oriented as opposed to being Row-oriented. In column-oriented DBMSes, data is stored in columns, with values from the same columns stored together. According to the ClickHouse documentation, storing data this way results in 100-1000x increase in the processing of queries, making ClickHouse highly performant and capable of returning results in a fraction of a second.

Storing data in columns rather than rows allows ClickHouse to scan only the relevant columns during queries, which reduces I/O overhead and improves the performance of aggregations. Additionally, columnar data is highly compressible, leading to efficient use of disk space and memory.

ClickHouse supports parallel processing on multiple cores, distributed processing on multiple servers, data compression, fast reads and real-time data inserts. These features and others make it a popular choice for data analysis and machine learning.

ClickHouse shines in a variety of use cases, making it a good choice for organisations that need to analyse large data sets quickly and efficiently. One use case I’ve deployed for clients is real-time analytics in the public healthcare sector, where the organisations process streaming medical data and generate insights, reports and visualisations on the fly.

Installation

ClickHouse can be used as a Cloud service, installed in Linux OSes using package managers or run as a Docker container. In this post, I’ll demonstrate how to set it up using Docker.

To run ClickHouse, create a docker compose file like this:

services:
  clickhouse:
    container_name: clickhouse-server-container
    image: clickhouse/clickhouse-server:latest
    ports:
      - 8123:8123
      - 9000:9000
    volumes:
      - clickhouse-data:/var/lib/clickhouse/
      - clickhouse-logs:/var/log/clickhouse-server/

volumes:
  clickhouse-data:
  clickhouse-logs:

This docker-compose.yml file creates a clickhouse container using the latest clickhouse-server image and exposes port 8123 for the HTTP interface and 9000 for the native client. You can disable any port you don’t need to have exposed. The file also creates volumes for storing the database and logs.

To start this container run:

docker compose up -d

Verify that the installation worked by connecting to one of its endpoints:
curl http://localhost:8123 if the server is set up correctly, it will return an Ok. response with status code 200.

Create Super User

ClickHouse creates a default user named default when you first start it. This user has almost all the rights and permissions and has unrestricted access to the data. The only permissions this user does not have are SQL-driven access control permissions, which are disabled by default.

To secure new ClickHouse installations, it’s recommended to temporarily enable SQL-driven access control for the default user, logging into ClickHouse and using it to create all the required admin accounts, roles and normal users and then disabling SQL-driven access control and account management permissions for it.

Enable SQL-driven access management

ClickHouse supports configuration via both configuration files and SQL-driven Access Management. SQL-driven access management allows you to manage user permissions and roles using SQL commands. This is useful in environments that require fine-grained control over who can access what data and resources.

If ClickHouse is running directly in the host machine, Find and edit the users.xml configuration file to change the default user’s settings.

If you’re running ClickHouse as a docker container, you can create a new configuration file like the one below bind-mount to the container as a volume.

Create a config directory to store all the user-specific configuration files and map it to the /etc/clickhouse-server/users.d/ directory. The docker-compose.yml file should look like this now:

services:
  clickhouse:
    container_name: clickhouse-server-container
    image: clickhouse/clickhouse-server:latest
    ports:
      - 8123:8123
      - 9000:9000
    volumes:
      - clickhouse-data:/var/lib/clickhouse/
      - clickhouse-logs:/var/log/clickhouse-server/
      - ./config:/etc/clickhouse-server/users.d/

volumes:
  clickhouse-data:
  clickhouse-logs:

Next, add this configuration file to the config directory:

<clickhouse>
    <users>
        <default>
            <access_management>1</access_management>
            <named_collection_control>1</named_collection_control>
            <show_named_collections>1</show_named_collections>
            <show_named_collections_secrets>1</show_named_collections_secrets>
        </default>
    </users>
</clickhouse>

This configuration enables SQL access management, allowing you to create and grant permissions using the default account. Save the file and restart the ClickHouse server.

Define Users and Roles

Log in to ClickHouse using the default user:

clickhouse-client --user default

Create an Admin user

  1. Create the admin user with a secure password hashed using bcrypt:
CREATE USER admin_user IDENTIFIED WITH bcrypt_password BY '<your-secure-password-here';

Using bcrypt ensures the password is stored securely as bcrypt applies multiple rounds of hashing to passwords, making it computationally expensive for attackers to guess passwords.

  1. Next, grant the admin_user full admin privileges:
GRANT ALL ON *.* TO admin_user WITH GRANT OPTION;

Create an Admin Role

An alternative approach to creating administrative users individually is to create Roles with administrative privileges and assigning users to the roles. To create a role and assign a user to it, run the following commands In the clickhouse-client:

CREATE ROLE admin;
GRANT ALL ON *.* TO admin WITH GRANT OPTION;

Next, create a user and grant them the admin role:

CREATE USER admin_user IDENTIFIED WITH bcrypt_password BY '<your-secure-password-here';
GRANT admin to admin_user;

Disable SQL-driven Access Control

Once you’ve created the admin users, consider disabling SQL-driven access control for the default user by removing the configuration file you created above or modifying it to remove the access control fields:

<clickhouse>
    <users>
        <default>

        </default>
    </users>
</clickhouse>

Disabling SQL-driven access control and limiting the permissions of the default user after creating admin users or roles is a good idea to keep your ClickHouse installation secure since the default user typically has broad access to the database, which can pose a security risk if left enabled with SQL access control. Disabling SQL access control for this user minimises the risk of unauthorised access.

Conclusion

In this post, you saw how to install ClickHouse, a highly performant SQL data store using Docker and how to create administrator accounts and roles in it.