Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clickhouse improvments #9111

Merged
merged 15 commits into from
Apr 29, 2024
Merged
1 change: 0 additions & 1 deletion .flake8
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,6 @@ exclude =
mindsdb/integrations/handlers/tpot_handler/*
mindsdb/integrations/handlers/derby_handler/*
mindsdb/integrations/handlers/scylla_handler/*
mindsdb/integrations/handlers/clickhouse_handler/*
mindsdb/integrations/handlers/phoenix_handler/*
mindsdb/integrations/handlers/replicate_handler/*
mindsdb/integrations/handlers/pgvector_handler/*
Expand Down
7 changes: 4 additions & 3 deletions .github/workflows/test_on_push.yml
Original file line number Diff line number Diff line change
Expand Up @@ -108,8 +108,9 @@ jobs:
run: |
pip install .
pip install -r requirements/requirements-test.txt
pip install .[lightwood] # TODO: for now some tests rely on lightwood
pip install .[mssql]
pip install lightwood # TODO: for now some tests rely on lightwood
pip install mindsdb[mssql]
pip install mindsdb[clickhouse]
pip freeze
- name: Run unit tests
run: |
Expand All @@ -124,7 +125,7 @@ jobs:
fi
- name: Run Handlers tests and submit Coverage to coveralls
run: |
handlers=("mysql" "postgres" "mssql")
handlers=("mysql" "postgres" "mssql" "clickhouse")
for handler in "${handlers[@]}"
do
pytest --cov=mindsdb/integrations/handlers/${handler}_handler tests/unit/handlers/test_${handler}.py
Expand Down
96 changes: 60 additions & 36 deletions docs/integrations/data-integrations/clickhouse.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -3,59 +3,83 @@ title: ClickHouse
sidebarTitle: ClickHouse
---

This is the implementation of the ClickHouse data handler for MindsDB.

[ClickHouse](https://clickhouse.com/docs/en/intro/) is a column-oriented database management system (DBMS) for online analytical processing of queries (OLAP).
This documentation describes the integration of MindsDB with [ClickHouse](https://clickhouse.com/docs/en/intro), a high-performance, column-oriented SQL database management system (DBMS) for online analytical processing (OLAP).
The integration allows MindsDB to access data from ClickHouse and enhance ClickHouse with AI capabilities.

## Prerequisites

Before proceeding, ensure the following prerequisites are met:

1. Install MindsDB locally via [Docker](/setup/self-hosted/docker) or [Docker Desktop](/setup/self-hosted/docker-desktop).
2. To connect ClickHouse to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to ClickHouse.

## Implementation
## Connection

This handler was implemented using the standard `clickhouse-sqlalchemy` library.
Establish a connection to ClickHouse from MindsDB by executing the following SQL command and providing its [handler name](https://github.com/mindsdb/mindsdb/tree/staging/mindsdb/integrations/handlers/clickhouse_handler) as an engine.

The required arguments to establish a connection are as follows:
```sql
CREATE DATABASE clickhouse_conn
WITH ENGINE = 'clickhouse',
PARAMETERS = {
"host": "127.0.0.1",
"port": "8443",
"user": "root",
"password": "mypass",
"database": "test_data",
"protocol" : "https"
}
```

* `host` is the hostname or IP address of the ClickHouse server.
* `port` is the TCP/IP port of the ClickHouse server.
* `user` is the username used to authenticate with the ClickHouse server.
* `password` is the password to authenticate the user with the ClickHouse server.
* `database` defaults to `default`. It is the database name to use when connecting with the ClickHouse server.
* `protocol` defaults to `native`. It is an optional parameter. Its supported values are `http` and `https`.
Required connection parameters include the following:

## Usage
* `host`: is the hostname or IP address of the ClickHouse server.
* `port`: is the TCP/IP port of the ClickHouse server.
* `user`: is the username used to authenticate with the ClickHouse server.
* `password`: is the password to authenticate the user with the ClickHouse server.
* `database`: defaults to `default`. It is the database name to use when connecting with the ClickHouse server.
* `protocol`: defaults to `native`. It is an optional parameter. Its supported values are `native`, `http` and `https`.

In order to make use of this handler and connect to the ClickHouse database in MindsDB, the following syntax can be used:
## Usage

```sql
CREATE DATABASE clickhouse_datasource
WITH
engine = 'clickhouse',
parameters = {
"host": "127.0.0.1",
"port": "9000",
"user": "root",
"password": "password",
"database": "test_db"
};
```
The following usage examples utilize the connection to ClickHouse made via the `CREATE DATABASE` statement and named `clickhouse_conn`.

You can use this established connection to query your table as follows:
Retrieve data from a specified table by providing the integration and table name.

```sql
SELECT *
FROM clickhouse_datasource.example_table;
FROM clickhouse_conn.table_name
LIMIT 10;
```

<Tip>
If you want to switch to a different database instead of the one you have connected, you can include it in the query as:
```sql
SELECT *
FROM clickhouse_datasource.new_database.example_table;
```
</Tip>
## Troubleshooting

<Warning>
`Database Connection Error`

* **Symptoms**: Failure to connect MindsDB with the ClickHouse database.
* **Checklist**:
1. Ensure that the ClickHouse server is running and accessible
2. Confirm that host, port, user, and password are correct. Try a direct MySQL connection.
3. Test the network connection between the MindsDB host and the ClickHouse server.
</Warning>

<Warning>
`Slow Connection Initialization`

* **Symptoms**: Connecting to the ClickHouse server takes an exceptionally long time, or connections hang without completing
* **Checklist**:
1. Ensure that you are using the appropriate protocol (http, https, or native) for your ClickHouse setup. Misconfigurations here can lead to significant delays.
2. Ensure that firewalls or security groups (in cloud environments) are properly configured to allow traffic on the necessary ports (as 8123 for HTTP or 9000 for native).
</Warning>

<Warning>
`SQL statement cannot be parsed by mindsdb_sql`

* **Symptoms**: SQL queries failing or not recognizing table names containing spaces, reserved words or special characters.
* **Checklist**:
1. Ensure table names with spaces or special characters are enclosed in backticks.
2. Examples:
* Incorrect: SELECT * FROM integration.travel data
* Incorrect: SELECT * FROM integration.'travel data'
* Correct: SELECT * FROM integration.\`travel data\`
</Warning>
61 changes: 0 additions & 61 deletions mindsdb/integrations/handlers/Handlers_Manual_QA.md

This file was deleted.

94 changes: 65 additions & 29 deletions mindsdb/integrations/handlers/clickhouse_handler/README.md
Original file line number Diff line number Diff line change
@@ -1,37 +1,25 @@
# ClickHouse Handler
---
title: ClickHouse
sidebarTitle: ClickHouse
---

This is the implementation of the ClickHouse handler for MindsDB.
This documentation describes the integration of MindsDB with [ClickHouse](https://clickhouse.com/docs/en/intro), a high-performance, column-oriented SQL database management system (DBMS) for online analytical processing (OLAP).
The integration allows MindsDB to access data from ClickHouse and enhance ClickHouse with AI capabilities.

## ClickHouse
## Prerequisites

ClickHouse is a column-oriented database management system (DBMS) for online analytical processing of queries (OLAP). https://clickhouse.com/docs/en/intro/
Before proceeding, ensure the following prerequisites are met:

1. Install MindsDB [locally via Docker](https://docs.mindsdb.com/setup/self-hosted/docker) or use [MindsDB Cloud](https://cloud.mindsdb.com/).
2. To connect ClickHouse to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).

## Connection

## Implementation
This handler was implemented using the standard `clickhouse-sqlalchemy` library https://clickhouse-sqlalchemy.readthedocs.io/en/latest/.
Please install it before using this handler:

```
pip install clickhouse-sqlalchemy
```

The required arguments to establish a connection are as follows:

* `host` is the hostname or IP address of the ClickHouse server.
* `port` is the TCP/IP port of the ClickHouse server.
* `user` is the username used to authenticate with the ClickHouse server.
* `password` is the password to authenticate the user with the ClickHouse server.
* `database` defaults to `default`. It is the database name to use when connecting with the ClickHouse server.
* `protocol` defaults to `native`. It is an optional parameter. Its supported values are `native`, `http` and `https`.

## Usage

To connect to ClickHouse use add `engine=clickhouse` to the CREATE DATABASE statement as:
Establish a connection to ClickHouse from MindsDB by executing the following SQL command and providing its [handler name](https://github.com/mindsdb/mindsdb/tree/staging/mindsdb/integrations/handlers/clickhouse_handler) as an engine.

```sql
CREATE DATABASE clic
WITH ENGINE = "clickhouse",
CREATE DATABASE clickhouse_conn
WITH ENGINE = 'clickhouse',
PARAMETERS = {
"host": "127.0.0.1",
"port": "8443",
Expand All @@ -42,8 +30,56 @@ PARAMETERS = {
}
```

Now, you can use this established connection to query your database as follows,
Required connection parameters include the following:

* `host`: is the hostname or IP address of the ClickHouse server.
* `port`: is the TCP/IP port of the ClickHouse server.
* `user`: is the username used to authenticate with the ClickHouse server.
* `password`: is the password to authenticate the user with the ClickHouse server.
* `database`: defaults to `default`. It is the database name to use when connecting with the ClickHouse server.
* `protocol`: defaults to `native`. It is an optional parameter. Its supported values are `native`, `http` and `https`.

## Usage

The following usage examples utilize the connection to ClickHouse made via the `CREATE DATABASE` statement and named `clickhouse_conn`.

Retrieve data from a specified table by providing the integration and table name.

```sql
SELECT * FROM clic.test_data.table
```
SELECT *
FROM clickhouse_conn.table_name
LIMIT 10;
```

## Troubleshooting

<Warning>
`Database Connection Error`

* **Symptoms**: Failure to connect MindsDB with the ClickHouse database.
* **Checklist**:
1. Ensure that the ClickHouse server is running and accessible
2. Confirm that host, port, user, and password are correct. Try a direct MySQL connection.
3. Test the network connection between the MindsDB host and the ClickHouse server.
</Warning>

<Warning>
`Slow Connection Initialization`

* **Symptoms**: Connecting to the ClickHouse server takes an exceptionally long time, or connections hang without completing
* **Checklist**:
1. Ensure that you are using the appropriate protocol (http, https, or native) for your ClickHouse setup. Misconfigurations here can lead to significant delays.
2. Ensure that firewalls or security groups (in cloud environments) are properly configured to allow traffic on the necessary ports (as 8123 for HTTP or 9000 for native).
</Warning>

<Warning>
`SQL statement cannot be parsed by mindsdb_sql`

* **Symptoms**: SQL queries failing or not recognizing table names containing spaces, reserved words or special characters.
* **Checklist**:
1. Ensure table names with spaces or special characters are enclosed in backticks.
2. Examples:
* Incorrect: SELECT * FROM integration.travel data
* Incorrect: SELECT * FROM integration.'travel data'
* Correct: SELECT * FROM integration.\`travel data\`
</Warning>
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,4 @@
__github__ = 'https://github.com/mindsdb/mindsdb'
__pypi__ = 'https://pypi.org/project/mindsdb/'
__license__ = 'MIT'
__copyright__ = 'Copyright 2022- mindsdb'
__copyright__ = 'Copyright 2022 MindsDB'