feat(connector): add DynamoDB sink #16670

jetjinser · 2024-05-09T17:56:34Z

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

1:1 mapping DynamoDB table sink.

Checklist

I have written necessary rustdoc comments
I have added necessary unit tests and integration tests
I have added test labels as necessary. See details.
I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
All checks passed in ./risedev check (or alias, ./risedev c)
My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)

My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

new sink:

CREATE TABLE IF NOT EXISTS movies (
    year integer,
    title varchar,
    description varchar,
    primary key (year, title)
);

CREATE SINK
  dyn_sink
FROM
  movies
WITH
(
  connector = 'dynamodb',
  table = 'Movies',
  primary_key = 'year,title',
  endpoint = 'http://localhost:8000',
  region = 'us',
  access_key = 'ac',
  secret_key = 'sk'
);

xiangjinwu · 2024-05-13T03:55:59Z

src/connector/src/sink/dynamodb.rs

+            | ScalarRefImpl::Struct(_)
+            | ScalarRefImpl::Jsonb(_)) => AttributeValue::S(string.to_text()),


struct may be better mapped to DynamoDB Map

jsonb can be mapped as string for now, and latter extended to a dynamic recursive one similar to #11699. It is your call whether the latter dynamic one shall be the default here instead. Just bringing the alternative option to attention.

I'm not sure, the struct here must be named, since in create sink?

Does this mean providing an TimestampHandlingMode-like option for jsonb format to DynamoDB sink?

You can consider struct as a list of named key values pairs, right?

struct < id varchar, name varchar >

Map<String, Value> { "id": "123", "name": "jinser" }

Yes, I think the current implementation is like this

fuyufjh · 2024-05-14T06:37:36Z

src/connector/src/sink/dynamodb.rs

+    let Some(scalar_ref) = scalar_ref else {
+        return Ok(AttributeValue::Null(true));
+    };
+    let attr = match (data_type, scalar_ref) {


It seems that matching the data_type only could make code a bit cleaner

I'm worried that the DataType is different from the ScalarRefImpl variant, and just matching data_type would mean requiring scalar_ref.into_foo() (which could panic in some case). Do you mean that this function map_data_type is only used here and does not need to consider possible the tow varant different situations?

I think each DataType must have only one possible corresponding ScalarRefImpl, so no need to match ScalarRefImpl.

fuyufjh · 2024-05-14T06:45:35Z

src/connector/src/sink/dynamodb.rs

+    async fn write_chunk_inner(&mut self, chunk: StreamChunk) -> Result<()> {
+        for (op, row) in chunk.rows() {
+            match op {
+                Op::Insert | Op::UpdateInsert => {


Have you ever tested an update event?

In self.payload_writer.write_chunk(), you inserted all the events before delete, which effectively means the UpdateInsert will happen before UpdateDelete. IIUC, the rows will be deleted rather than being updated.

Here, I prefer to handle UpdateInsert and UpdateDelete as a whole i.e. convert them into a single put KV into DynamoDB. There is no reason not doing so.

But keep in mind that this won't really solve the problem. Please carefully consider both of the following case:

An Insert comes after Delete on the same key

An Delete comes after Insert on the same key

Perhaps the current implementation of self.payload_writer.write_chunk() won't work.

yuhao-su · 2024-05-14T20:26:20Z

src/connector/src/connector_common/common.rs

+    #[serde(rename = "aws.region")]
+    pub stream_region: String,
+    #[serde(rename = "aws.endpoint")]
+    pub endpoint: Option<String>,
+    #[serde(rename = "aws.credentials.access_key_id")]
+    pub credentials_access_key: Option<String>,
+    #[serde(rename = "aws.credentials.secret_access_key")]
+    pub credentials_secret_access_key: Option<String>,
+    #[serde(rename = "aws.credentials.session_token")]
+    pub session_token: Option<String>,
+    #[serde(rename = "aws.credentials.role.arn")]
+    pub assume_role_arn: Option<String>,
+    #[serde(rename = "aws.credentials.role.external_id")]
+    pub assume_role_external_id: Option<String>,


You can just put AwsAuthProps here.

refactor dynamodb sink to use batch_write_item

lmatz · 2024-05-20T12:12:39Z

DynamoDB can be deployed locally via docker image: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DynamoDBLocal.html
worth adding an integration test

jetjinser · 2024-05-21T07:45:44Z

DynamoDB can be deployed locally via docker image: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DynamoDBLocal.html worth adding an integration test

Should I add it to this PR? Or should I open another PR?

jetjinser · 2024-05-22T07:10:44Z

@xiangjinwu @fuyufjh @yuhao-su I've made the changes, please take a review when you have time 😃

feat(connector): add DynamoDB sink

bf5e6fe

jetjinser added type/feature user-facing-changes Contains changes that are visible to users labels May 9, 2024

feat: add DynamoDB sink key schema validation

a808f9c

jetjinser marked this pull request as ready for review May 10, 2024 17:14

jetjinser requested a review from a team as a code owner May 10, 2024 17:14

Merge remote-tracking branch 'origin/main' into jinser/sink-DynamoDB

0788649

xiangjinwu reviewed May 13, 2024

View reviewed changes

fix: map_data_type to take data_type as argument

75fe60a

jetjinser requested a review from xiangjinwu May 13, 2024 11:15

fuyufjh requested review from fuyufjh, yuhao-su and xxhZs May 14, 2024 03:37

fuyufjh reviewed May 14, 2024

View reviewed changes

yuhao-su reviewed May 14, 2024

View reviewed changes

jetjinser added 3 commits May 16, 2024 13:07

fix(sink): auto de-dup dynamodb requests before sending

2f7b839

refactor dynamodb sink to use batch_write_item

refactor: simplify match statement

d4686b2

refactor: de-DynamoDbCommon to AwsAuthProps

56bf209

jetjinser requested review from fuyufjh and yuhao-su May 16, 2024 10:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(connector): add DynamoDB sink #16670

feat(connector): add DynamoDB sink #16670

jetjinser commented May 9, 2024 •

edited

xiangjinwu May 13, 2024

jetjinser May 13, 2024 •

edited

neverchanje May 20, 2024

jetjinser May 20, 2024

fuyufjh May 14, 2024

jetjinser May 14, 2024

fuyufjh May 14, 2024 •

edited

fuyufjh May 14, 2024 •

edited

yuhao-su May 14, 2024

lmatz commented May 20, 2024

jetjinser commented May 21, 2024

jetjinser commented May 22, 2024

		\| ScalarRefImpl::Struct(_)
		\| ScalarRefImpl::Jsonb(_)) => AttributeValue::S(string.to_text()),

feat(connector): add DynamoDB sink #16670

Are you sure you want to change the base?

feat(connector): add DynamoDB sink #16670

Conversation

jetjinser commented May 9, 2024 • edited

What's changed and what's your intention?

Checklist

Documentation

Release note

xiangjinwu May 13, 2024

Choose a reason for hiding this comment

jetjinser May 13, 2024 • edited

Choose a reason for hiding this comment

neverchanje May 20, 2024

Choose a reason for hiding this comment

jetjinser May 20, 2024

Choose a reason for hiding this comment

fuyufjh May 14, 2024

Choose a reason for hiding this comment

jetjinser May 14, 2024

Choose a reason for hiding this comment

fuyufjh May 14, 2024 • edited

Choose a reason for hiding this comment

fuyufjh May 14, 2024 • edited

Choose a reason for hiding this comment

yuhao-su May 14, 2024

Choose a reason for hiding this comment

lmatz commented May 20, 2024

jetjinser commented May 21, 2024

jetjinser commented May 22, 2024

jetjinser commented May 9, 2024 •

edited

jetjinser May 13, 2024 •

edited

fuyufjh May 14, 2024 •

edited

fuyufjh May 14, 2024 •

edited