Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clickhouse improvments #9111

Merged
merged 15 commits into from
Apr 29, 2024
1 change: 0 additions & 1 deletion .flake8
Expand Up @@ -136,7 +136,6 @@ exclude =
mindsdb/integrations/handlers/tpot_handler/*
mindsdb/integrations/handlers/derby_handler/*
mindsdb/integrations/handlers/scylla_handler/*
mindsdb/integrations/handlers/clickhouse_handler/*
mindsdb/integrations/handlers/phoenix_handler/*
mindsdb/integrations/handlers/replicate_handler/*
mindsdb/integrations/handlers/pgvector_handler/*
Expand Down
96 changes: 60 additions & 36 deletions docs/integrations/data-integrations/clickhouse.mdx
Expand Up @@ -3,59 +3,83 @@ title: ClickHouse
sidebarTitle: ClickHouse
---

This is the implementation of the ClickHouse data handler for MindsDB.

[ClickHouse](https://clickhouse.com/docs/en/intro/) is a column-oriented database management system (DBMS) for online analytical processing of queries (OLAP).
This documentation describes the integration of MindsDB with [ClickHouse](https://clickhouse.com/docs/en/intro), a high-performance, column-oriented SQL database management system (DBMS) for online analytical processing (OLAP).
The integration allows MindsDB to access data from ClickHouse and enhance ClickHouse with AI capabilities.

## Prerequisites

Before proceeding, ensure the following prerequisites are met:

1. Install MindsDB [locally via Docker](https://docs.mindsdb.com/setup/self-hosted/docker) or use [MindsDB Cloud](https://cloud.mindsdb.com/).
2. To connect ClickHouse to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).
3. Install or ensure access to ClickHouse.

## Implementation
## Connection

This handler was implemented using the standard `clickhouse-sqlalchemy` library.
Establish a connection to ClickHouse from MindsDB by executing the following SQL command and providing its [handler name](https://github.com/mindsdb/mindsdb/tree/staging/mindsdb/integrations/handlers/clickhouse_handler) as an engine.

The required arguments to establish a connection are as follows:
```sql
CREATE DATABASE clickhouse_conn
WITH ENGINE = 'clickhouse',
PARAMETERS = {
"host": "127.0.0.1",
"port": "8443",
"user": "root",
"password": "mypass",
"database": "test_data",
"protocol" : "https"
}
```

* `host` is the hostname or IP address of the ClickHouse server.
* `port` is the TCP/IP port of the ClickHouse server.
* `user` is the username used to authenticate with the ClickHouse server.
* `password` is the password to authenticate the user with the ClickHouse server.
* `database` defaults to `default`. It is the database name to use when connecting with the ClickHouse server.
* `protocol` defaults to `native`. It is an optional parameter. Its supported values are `http` and `https`.
Required connection parameters include the following:

## Usage
* `host`: is the hostname or IP address of the ClickHouse server.
* `port`: is the TCP/IP port of the ClickHouse server.
* `user`: is the username used to authenticate with the ClickHouse server.
* `password`: is the password to authenticate the user with the ClickHouse server.
* `database`: defaults to `default`. It is the database name to use when connecting with the ClickHouse server.
* `protocol`: defaults to `native`. It is an optional parameter. Its supported values are `native`, `http` and `https`.

In order to make use of this handler and connect to the ClickHouse database in MindsDB, the following syntax can be used:
## Usage

```sql
CREATE DATABASE clickhouse_datasource
WITH
engine = 'clickhouse',
parameters = {
"host": "127.0.0.1",
"port": "9000",
"user": "root",
"password": "password",
"database": "test_db"
};
```
The following usage examples utilize the connection to ClickHouse made via the `CREATE DATABASE` statement and named `clickhouse_conn`.

You can use this established connection to query your table as follows:
Retrieve data from a specified table by providing the integration and table name.

```sql
SELECT *
FROM clickhouse_datasource.example_table;
FROM clickhouse_conn.table_name
LIMIT 10;
```

<Tip>
If you want to switch to a different database instead of the one you have connected, you can include it in the query as:
```sql
SELECT *
FROM clickhouse_datasource.new_database.example_table;
```
</Tip>
## Troubleshooting

<Warning>
`Database Connection Error`

* **Symptoms**: Failure to connect MindsDB with the ClickHouse database.
* **Checklist**:
1. Ensure that the ClickHouse server is running and accessible
2. Confirm that host, port, user, and password are correct. Try a direct MySQL connection.
3. Test the network connection between the MindsDB host and the ClickHouse server.
</Warning>

<Warning>
`Slow Connection Initialization`

* **Symptoms**: Connecting to the ClickHouse server takes an exceptionally long time, or connections hang without completing
* **Checklist**:
1. Ensure that you are using the appropriate protocol (http, https, or native) for your ClickHouse setup. Misconfigurations here can lead to significant delays.
2. Ensure that firewalls or security groups (in cloud environments) are properly configured to allow traffic on the necessary ports (as 8123 for HTTP or 9000 for native).
</Warning>

<Warning>
`SQL statement cannot be parsed by mindsdb_sql`

* **Symptoms**: SQL queries failing or not recognizing table names containing spaces, reserved words or special characters.
* **Checklist**:
1. Ensure table names with spaces or special characters are enclosed in backticks.
2. Examples:
* Incorrect: SELECT * FROM integration.travel data
* Incorrect: SELECT * FROM integration.'travel data'
* Correct: SELECT * FROM integration.\`travel data\`
</Warning>
94 changes: 65 additions & 29 deletions mindsdb/integrations/handlers/clickhouse_handler/README.md
@@ -1,37 +1,25 @@
# ClickHouse Handler
---
title: ClickHouse
sidebarTitle: ClickHouse
---

This is the implementation of the ClickHouse handler for MindsDB.
This documentation describes the integration of MindsDB with [ClickHouse](https://clickhouse.com/docs/en/intro), a high-performance, column-oriented SQL database management system (DBMS) for online analytical processing (OLAP).
The integration allows MindsDB to access data from ClickHouse and enhance ClickHouse with AI capabilities.

## ClickHouse
## Prerequisites

ClickHouse is a column-oriented database management system (DBMS) for online analytical processing of queries (OLAP). https://clickhouse.com/docs/en/intro/
Before proceeding, ensure the following prerequisites are met:

1. Install MindsDB [locally via Docker](https://docs.mindsdb.com/setup/self-hosted/docker) or use [MindsDB Cloud](https://cloud.mindsdb.com/).
2. To connect ClickHouse to MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies).

## Connection

## Implementation
This handler was implemented using the standard `clickhouse-sqlalchemy` library https://clickhouse-sqlalchemy.readthedocs.io/en/latest/.
Please install it before using this handler:

```
pip install clickhouse-sqlalchemy
```

The required arguments to establish a connection are as follows:

* `host` is the hostname or IP address of the ClickHouse server.
* `port` is the TCP/IP port of the ClickHouse server.
* `user` is the username used to authenticate with the ClickHouse server.
* `password` is the password to authenticate the user with the ClickHouse server.
* `database` defaults to `default`. It is the database name to use when connecting with the ClickHouse server.
* `protocol` defaults to `native`. It is an optional parameter. Its supported values are `native`, `http` and `https`.

## Usage

To connect to ClickHouse use add `engine=clickhouse` to the CREATE DATABASE statement as:
Establish a connection to ClickHouse from MindsDB by executing the following SQL command and providing its [handler name](https://github.com/mindsdb/mindsdb/tree/staging/mindsdb/integrations/handlers/clickhouse_handler) as an engine.

```sql
CREATE DATABASE clic
WITH ENGINE = "clickhouse",
CREATE DATABASE clickhouse_conn
WITH ENGINE = 'clickhouse',
PARAMETERS = {
"host": "127.0.0.1",
"port": "8443",
Expand All @@ -42,8 +30,56 @@ PARAMETERS = {
}
```

Now, you can use this established connection to query your database as follows,
Required connection parameters include the following:

* `host`: is the hostname or IP address of the ClickHouse server.
* `port`: is the TCP/IP port of the ClickHouse server.
* `user`: is the username used to authenticate with the ClickHouse server.
* `password`: is the password to authenticate the user with the ClickHouse server.
* `database`: defaults to `default`. It is the database name to use when connecting with the ClickHouse server.
* `protocol`: defaults to `native`. It is an optional parameter. Its supported values are `native`, `http` and `https`.

## Usage

The following usage examples utilize the connection to ClickHouse made via the `CREATE DATABASE` statement and named `clickhouse_conn`.

Retrieve data from a specified table by providing the integration and table name.

```sql
SELECT * FROM clic.test_data.table
```
SELECT *
FROM clickhouse_conn.table_name
LIMIT 10;
```

## Troubleshooting

<Warning>
`Database Connection Error`

* **Symptoms**: Failure to connect MindsDB with the ClickHouse database.
* **Checklist**:
1. Ensure that the ClickHouse server is running and accessible
2. Confirm that host, port, user, and password are correct. Try a direct MySQL connection.
3. Test the network connection between the MindsDB host and the ClickHouse server.
</Warning>

<Warning>
`Slow Connection Initialization`

* **Symptoms**: Connecting to the ClickHouse server takes an exceptionally long time, or connections hang without completing
* **Checklist**:
1. Ensure that you are using the appropriate protocol (http, https, or native) for your ClickHouse setup. Misconfigurations here can lead to significant delays.
2. Ensure that firewalls or security groups (in cloud environments) are properly configured to allow traffic on the necessary ports (as 8123 for HTTP or 9000 for native).
</Warning>

<Warning>
`SQL statement cannot be parsed by mindsdb_sql`

* **Symptoms**: SQL queries failing or not recognizing table names containing spaces, reserved words or special characters.
* **Checklist**:
1. Ensure table names with spaces or special characters are enclosed in backticks.
2. Examples:
* Incorrect: SELECT * FROM integration.travel data
* Incorrect: SELECT * FROM integration.'travel data'
* Correct: SELECT * FROM integration.\`travel data\`
</Warning>
Expand Up @@ -6,4 +6,4 @@
__github__ = 'https://github.com/mindsdb/mindsdb'
__pypi__ = 'https://pypi.org/project/mindsdb/'
__license__ = 'MIT'
__copyright__ = 'Copyright 2022- mindsdb'
__copyright__ = 'Copyright 2022 MindsDB'
Expand Up @@ -3,6 +3,7 @@

import pandas as pd
from sqlalchemy import create_engine
from sqlalchemy.exc import SQLAlchemyError
from clickhouse_sqlalchemy.drivers.base import ClickHouseDialect
from mindsdb_sql.parser.ast.base import ASTNode
from mindsdb_sql.render.sqlalchemy_render import SqlalchemyRender
Expand All @@ -18,6 +19,7 @@

logger = log.getLogger(__name__)


class ClickHouseHandler(DatabaseHandler):
"""
This handler handles connection and execution of the ClickHouse statements.
Expand All @@ -33,17 +35,21 @@
self.is_connected = False
self.protocol = connection_data.get('protocol', 'native')



def __del__(self):
if self.is_connected is True:
self.disconnect()

def connect(self):
"""
Handles the connection to a ClickHouse
Establishes a connection to a ClickHouse server using SQLAlchemy.

Raises:
SQLAlchemyError: If an error occurs while connecting to the database.

Returns:
Connection: A SQLAlchemy Connection object to the ClickHouse database.
"""
if self.is_connected is True:
if self.is_connected:
return self.connection

protocol = "clickhouse+native" if self.protocol == 'native' else "clickhouse+http"
Expand All @@ -53,22 +59,30 @@
password = quote(self.connection_data['password'])
database = quote(self.connection_data['database'])
url = f'{protocol}://{user}:{password}@{host}:{port}/{database}'
# This is not redundunt. Check https://clickhouse-sqlalchemy.readthedocs.io/en/latest/connection.html#http
if self.protocol == 'https':
url = url + "?protocol=https"
try:
engine = create_engine(url)
connection = engine.raw_connection()
self.is_connected = True
self.connection = connection
except SQLAlchemyError as e:
logger.error(f'Failed to connect to ClickHouse database at {url}: {e}')
Fixed Show fixed Hide fixed
self.is_connected = False
raise

engine = create_engine(url)
connection = engine.raw_connection()
self.is_connected = True
self.connection = connection
return self.connection

def check_connection(self) -> StatusResponse:
"""
Check the connection of the ClickHouse database
:return: success status and error message if error occurs
Checks the status of the connection to the ClickHouse.

Returns:
StatusResponse: An object containing the success status and an error message if an error occurs.
"""
response = StatusResponse(False)
need_to_close = self.is_connected is False
need_to_close = not self.is_connected

try:
connection = self.connect()
Expand All @@ -78,25 +92,27 @@
finally:
cur.close()
response.success = True
except Exception as e:
except SQLAlchemyError as e:
logger.error(f'Error connecting to ClickHouse {self.connection_data["database"]}, {e}!')
response.error_message = e
response.error_message = str(e)
self.is_connected = False

if response.success is True and need_to_close:
self.disconnect()
if response.success is False and self.is_connected is True:
self.is_connected = False

return response

def native_query(self, query: str) -> Response:
"""
Receive SQL query and runs it
:param query: The SQL query to run in ClickHouse
:return: returns the records from the current recordset
"""
need_to_close = self.is_connected is False
Executes a SQL query and returns the result.

Args:
query (str): The SQL query to be executed.

Returns:
Response: A response object containing the result of the query or an error message.
"""

connection = self.connect()
cur = connection.cursor()
try:
Expand All @@ -113,7 +129,7 @@
else:
response = Response(RESPONSE_TYPE.OK)
connection.commit()
except Exception as e:
except SQLAlchemyError as e:
logger.error(f'Error running query: {query} on {self.connection_data["database"]}!')
response = Response(
RESPONSE_TYPE.ERROR,
Expand All @@ -123,9 +139,6 @@
finally:
cur.close()

if need_to_close is True:
self.disconnect()

return response

def query(self, query: ASTNode) -> Response:
Expand Down Expand Up @@ -190,12 +203,11 @@
'description': 'The password to authenticate the user with the ClickHouse server.',
'required': True,
'label': 'Password'
},

}
)

connection_args_example = OrderedDict(
protocol='clickhouse',
protocol='native',
host='127.0.0.1',
port=9000,
user='root',
Expand Down