Feat: decide batch query execution mode based on IO estimation #16695

chenzl25 · 2024-05-10T10:09:25Z

Is your feature request related to a problem? Please describe.

Currently, the decision to run a batch query in distributed mode or locally is based solely on the query structure itself, without considering the data size. This approach could pose a problem if the table being scanned is small, as running in distributed mode might incur excessive overhead due to scheduling costs potentially surpassing execution costs. I propose leveraging table statistics (e.g., row count, table size) to estimate an upper bound for IO operations in a batch query. The query should be executed in distributed mode only if the expected IO is sufficiently high.

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

chenzl25 added the type/feature label May 10, 2024

github-actions bot added this to the release-1.10 milestone May 10, 2024

chenzl25 linked a pull request May 10, 2024 that will close this issue

feat: Add IO estimation based query mode decider #16696

Open

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: decide batch query execution mode based on IO estimation #16695

Feat: decide batch query execution mode based on IO estimation #16695

chenzl25 commented May 10, 2024

Feat: decide batch query execution mode based on IO estimation #16695

Feat: decide batch query execution mode based on IO estimation #16695

Comments

chenzl25 commented May 10, 2024

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context