Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking: improve scaling up performance #14448

Open
4 of 10 tasks
lmatz opened this issue Jan 9, 2024 · 2 comments · Fixed by #15374
Open
4 of 10 tasks

Tracking: improve scaling up performance #14448

lmatz opened this issue Jan 9, 2024 · 2 comments · Fixed by #15374
Labels
help wanted Issues that need help from contributors type/perf

Comments

@lmatz
Copy link
Contributor

lmatz commented Jan 9, 2024

The dashboard includes RW's 1cn baseline, 1cn (4X resources), 4cn (each cn 1X resource) and other systems:
http://metabase.risingwave-cloud.xyz/question/9549-nexmark-rw-vs-flink-avg-source-throughput-all-testbeds?rw_tag=nightly-20240127&flink_tag=v1.16.0&flink_label=flink-medium-1tm-test-20230104,flink-4x-medium-1tm-test-20240104&flink_metrics=avg-job-throughput-per-second

To access the dashboard, please refer to:
https://www.notion.so/Performance-Test-Dashboard-Manual-e33b26eb188e48379a7b714a01a4fc2c

4X 1cn performance tests are executed weekly.

Improvements needed:

@lmatz lmatz added the type/perf label Jan 9, 2024
@github-actions github-actions bot added this to the release-1.7 milestone Jan 9, 2024
@lmatz lmatz added the help wanted Issues that need help from contributors label Feb 8, 2024
@lmatz lmatz pinned this issue Feb 8, 2024
@TennyZhuang TennyZhuang unpinned this issue Feb 19, 2024
@lmatz lmatz pinned this issue Feb 29, 2024
@lmatz lmatz reopened this Mar 1, 2024
@MrCroxx MrCroxx assigned MrCroxx and unassigned MrCroxx Mar 4, 2024
@lmatz lmatz removed this from the release-1.7 milestone Mar 6, 2024
@lmatz
Copy link
Contributor Author

lmatz commented Mar 7, 2024

Query q15, q16 and q17 are similar but different: https://github.com/risingwavelabs/kube-bench/blob/main/manifests/nexmark/nexmark-sinks.template.yaml#L700C5-L713C87

Q17's plan:

 StreamSink { type: append-only, columns: [auction, day, total_bids, rank1_bids, rank2_bids, rank3_bids, min_price, max_price, avg_price, sum_price] }
 └─StreamProject { exprs: [$expr3, $expr2, count, count filter(($expr4 < 10000:Int32)), count filter(($expr4 >= 10000:Int32) AND ($expr4 < 1000000:Int32)), count filter(($expr4 >= 1000000:Int32)), min($expr4), max($expr4), (sum($expr4) / count($expr4)::Decimal) as $expr5, sum($expr4)] }
   └─StreamHashAgg [append_only] { group_key: [$expr2, $expr3], aggs: [count, count filter(($expr4 < 10000:Int32)), count filter(($expr4 >= 10000:Int32) AND ($expr4 < 1000000:Int32)), count filter(($expr4 >= 1000000:Int32)), min($expr4), max($expr4), sum($expr4), count($expr4)] }
     └─StreamExchange { dist: HashShard($expr2, $expr3) }
       └─StreamProject { exprs: [ToChar($expr1, 'YYYY-MM-DD':Varchar) as $expr2, Field(bid, 0:Int32) as $expr3, Field(bid, 2:Int32) as $expr4, _row_id] }
         └─StreamFilter { predicate: (event_type = 2:Int32) }
           └─StreamRowIdGen { row_id_index: 6 }
             └─StreamWatermarkFilter { watermark_descs: [Desc { column: $expr1, expr: ($expr1 - '00:00:04':Interval) }], output_watermarks: [$expr1] }
               └─StreamProject { exprs: [event_type, person, auction, bid, Case((event_type = 0:Int32), Field(person, 6:Int32), (event_type = 1:Int32), Field(auction, 5:Int32), Field(bid, 5:Int32)) as $expr1, _rw_kafka_timestamp, _row_id] }
                 └─StreamSource { source: nexmark, columns: [event_type, person, auction, bid, _rw_kafka_timestamp, _row_id] }
(10 rows)

Although q15 and q16 do not scale well at the moment, q17 DOES scale quite well.

Can refer to the peak number at http://metabase.risingwave-cloud.xyz/question/9270-nexmark-q17-blackhole-4x-medium-1cn-affinity-avg-source-output-rows-per-second-rows-s-history-thtb-2767?start_date=2024-01-04

Can also check Flink's number at http://metabase.risingwave-cloud.xyz/question/9732-flink-nexmark-q17-flink-4x-medium-1tm-avg-job-throughput-per-second-records-s-history-thtb-2922?start_date=2023-12-05

Both 4X above.

Reasons:
#15705 and #15731

@emile-00 emile-00 unpinned this issue May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Issues that need help from contributors type/perf
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants