Data Streaming with Change Data Capture - AWS S3 as a target
- An AWS account;
- Docker installed;
- Basic knowledge of SQL commands.
- I'm running a docker container to MySQL server, if you want to use other MySQL server, you will need to change the database.hostname config in docker container kafka_connect_source as well the others configs about the database. You will need to make some configurations in MySQL server too, for more informations please visit debezium documentation - MySQL connector;
- I'm saving the files in parquet format, but you can change the format to json, avro or byte array, for that you will need to change the format.class config in docker container kafka_connect_sink.
-
Create an user in AWS with S3 access - you can choose the AmazonS3FullAccess policy to make it easier;
-
Create an access key to this user;
-
Put the access credentials in a .env file - use the .env_template file to help;
-
Create a bucket in S3 - in my example I created a bucket called cdc-s3-sink in us-east-1 region. You can create a bucket with the name and region that you want, but you will need to change the s3.bucket.name and s3.region configs in docker container kafka_connect_sink;
-
Run the command bellow to start the containers and the cdc flow:
docker-compose up -d
-
If the MySQL server chosen was in docker-compose.yml, run the commands bellow:
docker exec -it mysql /bin/bash
mysql -u root -p
and write the password, mysql123;
-
Do the DDL and DML commands that you want and see the files arriving in S3.