Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: decide batch query execution mode based on IO estimation #16695

Open
chenzl25 opened this issue May 10, 2024 · 0 comments · May be fixed by #16696
Open

Feat: decide batch query execution mode based on IO estimation #16695

chenzl25 opened this issue May 10, 2024 · 0 comments · May be fixed by #16696

Comments

@chenzl25
Copy link
Contributor

Is your feature request related to a problem? Please describe.

Currently, the decision to run a batch query in distributed mode or locally is based solely on the query structure itself, without considering the data size. This approach could pose a problem if the table being scanned is small, as running in distributed mode might incur excessive overhead due to scheduling costs potentially surpassing execution costs. I propose leveraging table statistics (e.g., row count, table size) to estimate an upper bound for IO operations in a batch query. The query should be executed in distributed mode only if the expected IO is sufficiently high.

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

@github-actions github-actions bot added this to the release-1.10 milestone May 10, 2024
@chenzl25 chenzl25 linked a pull request May 10, 2024 that will close this issue
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant