What is ClickHouse
ClickHouse is a relatively new player in the storage space. It is an open-source high-performance, SQL database for online and analytical processing (OLAP). Online analytical processing tools allow businesses to store, analyse and query large datasets.
ClickHouse, unlike most SQL database systems is Column oriented as opposed to being Row-oriented. In column-oriented DBMSes, data is stored in columns, with values from the same columns stored together. According to the ClickHouse documentation, storing data this way results in 100-1000x increase in the processing of queries, making ClickHouse highly performant and capable of returning results in a fraction of a second.
Storing data in columns rather than rows allows ClickHouse to scan only the relevant columns during queries, which reduces I/O overhead and improves the performance of aggregations. Additionally, columnar data is highly compressible, leading to efficient use of disk space and memory.
ClickHouse supports parallel processing on multiple cores, distributed processing on multiple servers, data compression, fast reads and real-time data inserts. These features and others make it a popular choice for data analysis and machine learning.
ClickHouse shines in a variety of use cases, making it a good choice for organisations that need to analyse large data sets quickly and efficiently. One use case I’ve deployed for clients is real-time analytics in the public healthcare sector, where the organisations process streaming medical data and generate insights, reports and visualisations on the fly.
Installation
ClickHouse can be used as a Cloud service, installed in Linux OSes using package managers or run as a Docker container. In this post, I’ll demonstrate how to set it up using Docker.
To run ClickHouse, create a docker compose file like this:
services: clickhouse: container_name: clickhouse-server-container image: clickhouse/clickhouse-server:latest ports: - 8123:8123 - 9000:9000 volumes: - clickhouse-data:/var/lib/clickhouse/ - clickhouse-logs:/var/log/clickhouse-server/ volumes: clickhouse-data: clickhouse-logs:
This docker-compose.yml
file creates a clickhouse container using the latest clickhouse-server
image and exposes port 8123
for the HTTP interface and 9000
for the native client. You can disable any port you don’t need to have exposed. The file also creates volumes for storing the database and logs.
To start this container run:
docker compose up -d
Verify that the installation worked by connecting to one of its endpoints:curl http://localhost:8123
if the server is set up correctly, it will return an Ok.
response with status code 200.
Create Super User
ClickHouse creates a default user named default
when you first start it. This user has almost all the rights and permissions and has unrestricted access to the data. The only permissions this user does not have are SQL-driven access control permissions, which are disabled by default.
To secure new ClickHouse installations, it’s recommended to temporarily enable SQL-driven access control for the default
user, logging into ClickHouse and using it to create all the required admin accounts, roles and normal users and then disabling SQL-driven access control and account management permissions for it.
Enable SQL-driven access management
ClickHouse supports configuration via both configuration files and SQL-driven Access Management. SQL-driven access management allows you to manage user permissions and roles using SQL commands. This is useful in environments that require fine-grained control over who can access what data and resources.
If ClickHouse is running directly in the host machine, Find and edit the users.xml
configuration file to change the default user’s settings.
If you’re running ClickHouse as a docker container, you can create a new configuration file like the one below bind-mount to the container as a volume.
Create a config directory to store all the user-specific configuration files and map it to the /etc/clickhouse-server/users.d/
directory. The docker-compose.yml file should look like this now:
services: clickhouse: container_name: clickhouse-server-container image: clickhouse/clickhouse-server:latest ports: - 8123:8123 - 9000:9000 volumes: - clickhouse-data:/var/lib/clickhouse/ - clickhouse-logs:/var/log/clickhouse-server/ - ./config:/etc/clickhouse-server/users.d/ volumes: clickhouse-data: clickhouse-logs:
Next, add this configuration file to the config
directory:
<clickhouse> <users> <default> <access_management>1</access_management> <named_collection_control>1</named_collection_control> <show_named_collections>1</show_named_collections> <show_named_collections_secrets>1</show_named_collections_secrets> </default> </users> </clickhouse>
This configuration enables SQL access management, allowing you to create and grant permissions using the default
account. Save the file and restart the ClickHouse server.
Define Users and Roles
Log in to ClickHouse using the default
user:
clickhouse-client --user default
Create an Admin user
- Create the admin user with a secure password hashed using bcrypt:
CREATE USER admin_user IDENTIFIED WITH bcrypt_password BY '<your-secure-password-here';
Using bcrypt
ensures the password is stored securely as bcrypt
applies multiple rounds of hashing to passwords, making it computationally expensive for attackers to guess passwords.
- Next, grant the
admin_user
full admin privileges:
GRANT ALL ON *.* TO admin_user WITH GRANT OPTION;
Create an Admin Role
An alternative approach to creating administrative users individually is to create Roles with administrative privileges and assigning users to the roles. To create a role and assign a user to it, run the following commands In the clickhouse-client
:
CREATE ROLE admin; GRANT ALL ON *.* TO admin WITH GRANT OPTION;
Next, create a user and grant them the admin
role:
CREATE USER admin_user IDENTIFIED WITH bcrypt_password BY '<your-secure-password-here'; GRANT admin to admin_user;
Disable SQL-driven Access Control
Once you’ve created the admin users, consider disabling SQL-driven access control for the default
user by removing the configuration file you created above or modifying it to remove the access control fields:
<clickhouse> <users> <default> </default> </users> </clickhouse>
Disabling SQL-driven access control and limiting the permissions of the default
user after creating admin users or roles is a good idea to keep your ClickHouse installation secure since the default
user typically has broad access to the database, which can pose a security risk if left enabled with SQL access control. Disabling SQL access control for this user minimises the risk of unauthorised access.
Conclusion
In this post, you saw how to install ClickHouse, a highly performant SQL data store using Docker and how to create administrator accounts and roles in it.