cdc-s3-sink

Data Streaming with Change Data Capture - AWS S3 as a target

I'm running a docker container to MySQL server, if you want to use other MySQL server, you will need to change the database.hostname config in docker container kafka_connect_source as well the others configs about the database. You will need to make some configurations in MySQL server too, for more informations please visit debezium documentation - MySQL connector;
I'm saving the files in parquet format, but you can change the format to json, avro or byte array, for that you will need to change the format.class config in docker container kafka_connect_sink.

Create an user in AWS with S3 access - you can choose the AmazonS3FullAccess policy to make it easier;
Create an access key to this user;
Put the access credentials in a .env file - use the .env_template file to help;
Create a bucket in S3 - in my example I created a bucket called cdc-s3-sink in us-east-1 region. You can create a bucket with the name and region that you want, but you will need to change the s3.bucket.name and s3.region configs in docker container kafka_connect_sink;
Run the command bellow to start the containers and the cdc flow:
```
docker-compose up -d
```
If the MySQL server chosen was in docker-compose.yml, run the commands bellow:
```
docker exec -it mysql /bin/bash
```
```
mysql -u root -p
```
and write the password, mysql123;
Do the DDL and DML commands that you want and see the files arriving in S3.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.env_template		.env_template
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
architecture.png		architecture.png
docker-compose.yml		docker-compose.yml

Provide feedback