New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: refresh stats for multi-tenant system database conversions #123905
sql: refresh stats for multi-tenant system database conversions #123905
Conversation
496b45a
to
7c040ec
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @fqazi)
pkg/sql/schema_changer.go
line 920 at r1 (raw file):
} if sc.mutationID == descpb.InvalidMutationID {
just trying to understand the code better: why is sc.mutationID
invalid in this case?
pkg/sql/schema_changer.go
line 955 at r1 (raw file):
defer func() { if err := waitToUpdateLeases(err == nil /* refreshStats */); err != nil && !errors.Is(err, catalog.ErrDescriptorNotFound) {
this is unrelated to your PR, but this looks like a bug. instead of err == nil
should this be retErr == nil
, and make retErr
and named return parameter of this function?
Previously, when converting the system database to multiregion its possible for table statistics to contain the existing type of crdb_region as bytes. This could happen if automatic statistics collection happened concurrently with the conversion to a multi-region system database. The conversion had logic to clear table statistics, but it was still possible for statistics collection to happen in between. This could cause queries against RBR system tables to fail because, since the table_statistics type information no longer matches with the table descriptor after. We started seeing this for the system database inside TestMrSystemDatabase, once conversion was added for the system tenant. To address this, this patch first adds extra logic in the schema changer to force a refresh of stats on system tables, which will force a refresh of statistics after the schema change, in case a stats refresh occurs before the job completes. We also modify the TestMrSystemDatabase to intentionally generate stats before changing the system database under the system tenant to avoid the risk of hitting this issue. With these changes we expect the test to no longer flake and any real world occurrence to be less transient. Fixes: cockroachdb#122790 Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rafiss)
pkg/sql/schema_changer.go
line 920 at r1 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
just trying to understand the code better: why is
sc.mutationID
invalid in this case?
So, the way we change column types for the region column is that we directly modify the descriptor, without any mutation. So, the job is created without any mutationID assigned. The only thing the job ends up doing is waiting for new version to be adopted in this case.
pkg/sql/schema_changer.go
line 955 at r1 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
this is unrelated to your PR, but this looks like a bug. instead of
err == nil
should this beretErr == nil
, and makeretErr
and named return parameter of this function?
Done.
Nice catch!
7c040ec
to
dd4ea19
Compare
@rafiss TFTR! bors r+ |
Previously, when converting the system database to multiregion its possible for table statistics to contain the existing type of crdb_region as bytes. This could happen if automatic statistics collection happened concurrently with the conversion to a multi-region system database. The conversion had logic to clear table statistics, but it was still possible for statistics collection to happen in between. This could cause queries against RBR system tables to fail because, since the table_statistics type information no longer matches with the table descriptor after. We started seeing this for the system database inside TestMrSystemDatabase, once conversion was added for the system tenant. To address this, this patch first adds extra logic in the schema changer to force a refresh of stats on system tables, which will force a refresh of statistics after the schema change, in case a stats refresh occurs before the job completes. We also modify the TestMrSystemDatabase to intentionally generate stats before changing the system database under the system tenant to avoid the risk of hitting this issue. With these changes we expect the test to no longer flake and any real world occurrence to be less transient.
Fixes: #122790
Release note: None