Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: dynamically determine histogram sample size #123972

Open
mgartner opened this issue May 10, 2024 · 3 comments
Open

sql: dynamically determine histogram sample size #123972

mgartner opened this issue May 10, 2024 · 3 comments
Labels
A-sql-table-stats Table statistics (and their automatic refresh). C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team

Comments

@mgartner
Copy link
Collaborator

mgartner commented May 10, 2024

To collect histograms in table statistics, we currently sample 10k rows.

The number of rows sampled when building histograms for table statistics is 10k, and it can be configured with sql.stats.histogram_samples.count.

For very large table, this is too few rows. We should find a way to dynamically adjust the sample size based on the size of the table.

See this discussion for more context: https://cockroachlabs.slack.com/archives/C01RX2G8LT1/p1701834921110179?thread_ts=1701825005.623189&cid=C01RX2G8LT1

Jira issue: CRDB-38633

@mgartner mgartner added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label May 10, 2024
@michae2 michae2 added A-sql-table-stats Table statistics (and their automatic refresh). T-sql-queries SQL Queries Team labels May 10, 2024
@nitesh-dubey
Copy link

Hi, @mgartner @michae2 Is this an internal issue, or is it available for contributors to pick up ?
The JIRA and slack link mentioned, which might contain more context are internal.

@michae2
Copy link
Collaborator

michae2 commented May 13, 2024

Hi @nitesh-dubey, thanks for your interest! We are planning to assign this to someone internal soon, so it would be better to pick up another issue if you don't mind. Maybe one tagged with "good first issue" or E-starter or E-quick-win would be good.

@nitesh-dubey
Copy link

I see...thanks @michae2 !
I'll go ahead and look at the issues with tags you suggested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-table-stats Table statistics (and their automatic refresh). C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team
Projects
Status: 24.2 Release
Development

No branches or pull requests

3 participants