Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: alter default cluster setting for sstable compression algorithm #123953

Open
Tracked by #123950
nicktrav opened this issue May 10, 2024 · 0 comments · May be fixed by #124245
Open
Tracked by #123950

storage: alter default cluster setting for sstable compression algorithm #123953

nicktrav opened this issue May 10, 2024 · 0 comments · May be fixed by #124245
Assignees
Labels
A-storage Relating to our storage engine (Pebble) on-disk storage. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-storage Storage Team
Projects

Comments

@nicktrav
Copy link
Collaborator

nicktrav commented May 10, 2024

Set storage.sstable.compression_algorithm to zstd.

There are likely some tests that will need to be altered to take into account SSTable size after being compressed with zstd, rather than snappy.

Jira issue: CRDB-38625

@nicktrav nicktrav added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-storage Relating to our storage engine (Pebble) on-disk storage. T-storage Storage Team labels May 10, 2024
@nicktrav nicktrav self-assigned this May 10, 2024
@blathers-crl blathers-crl bot added this to Incoming in Storage May 10, 2024
nicktrav added a commit to nicktrav/cockroach that referenced this issue May 10, 2024
In cockroachdb#120784, we now allow for the `zstd` compression algorithm to be used
as the codec for SSTables. Initial testing has shown this algorithm to
provide a better compression ratio than snappy, at the cost of a minor
increase in CPU. This unlocks benefits such as an improved node density,
as more data can reside in each store.

Alter the default SStable compression algorithm to `zstd`.

Closes cockroachdb#123953.

Release note (performance improvement): The default SSTable compression
algorithm was changed from snappy to zstd. The latter has been shown to
improve a store's compression ratio at the cost of a minor CPU increase.
Existing clusters can opt into this new compression algorithm by setting
`storage.sstable.compression_algorithm` to `zstd`. Existing SSTables
compressed with the snappy compression algorithm will NOT be actively
re-written. Instead, as SSTables are re-written over time with the new
compression algorithm as existing SSTables are compacted.
nicktrav added a commit to nicktrav/cockroach that referenced this issue May 10, 2024
In cockroachdb#120784, we now allow for the `zstd` compression algorithm to be used
as the codec for SSTables. Initial testing has shown this algorithm to
provide a better compression ratio than snappy, at the cost of a minor
increase in CPU. This unlocks benefits such as an improved node density,
as more data can reside in each store.

Alter the default SStable compression algorithm to `zstd`.

Closes cockroachdb#123953.

Release note (performance improvement): The default SSTable compression
algorithm was changed from snappy to zstd. The latter has been shown to
improve a store's compression ratio at the cost of a minor CPU increase.
Existing clusters can opt into this new compression algorithm by setting
`storage.sstable.compression_algorithm` to `zstd`. Existing SSTables
compressed with the snappy compression algorithm will NOT be actively
re-written. Instead, as SSTables are re-written over time with the new
compression algorithm as existing SSTables are compacted.
@nicktrav nicktrav moved this from Incoming to In Progress (this milestone) in Storage May 14, 2024
nicktrav added a commit to nicktrav/cockroach that referenced this issue May 15, 2024
Currently, there exists a single cluster setting,
`storage.sstable.compression_algorithm`, that controls the compression
algorithm used by the cluster when writing SSTables. This cluster
setting currently defaults to `snappy`.

There are various end destinations and use cases for SSTables - those
that reside in a Pebble store (the most common), those sent over the
wire via `AddSSTable`, and those generated as part of a backup for
storage in S3/GCS. Each of these destinations and use cases benefit from
being able to independently alter the compression algorithm.

In addition to the existing cluster setting, add two backup-specific
cluster settings:

- `storage.sstable.compression_algorithm_backup_storage`: applies to
  SSTs generated as part of a backup, where SSTs will reside in blob
  storage. These require compression, but not at the cost of additional
  CPU at generation time. Defaults to `snappy`, the existing algorithm
  used.
- `storage.sstable.compression_algorithm_backup_transport`: applies to
  SSTs generated as part of a backup that are sent, immediately
  iterated, and then discarded (i.e. they are never persisted). Such
  SSTs typically have larger block sizes and benefit from compression.
  Defaults to `snappy`, the existing algorithm used.

While this change introduces new cluster settings that allows
independently altering the compression algorithm for different types of
SSTs, the existing compression behavior is unchanged.

Touches cockroachdb#123953.

Release note (general change): Adds two new cluster settings,
`storage.sstable.compression_algorithm_backup_storage` and
`storage.sstable.compression_algorithm_backup_transport`, that in
addition to the existing cluster setting
`storage.sstable.compression_algorithm`, can be used to alter the
compression algorithm used for various types of SSTs.
nicktrav added a commit to nicktrav/cockroach that referenced this issue May 15, 2024
In cockroachdb#120784, we now allow for the `zstd` compression algorithm to be used
as the codec for SSTables. Initial testing has shown this algorithm to
provide a better compression ratio than snappy, at the cost of a minor
increase in CPU. This unlocks benefits such as an improved node density,
as more data can reside in each store.

Alter the default SSTable compression algorithm to `zstd`. Note that
this change only applies to SSTs written directly into a local Pebble
store, or generated to send over the wire for ingestion into a remote
store (e.g. `AddSSTable`). The defaults for backup-related SSTs are left
unchanged.

Closes cockroachdb#123953.

Release note (performance improvement): The default SSTable compression
algorithm was changed from snappy to zstd. The latter has been shown to
improve a store's compression ratio at the cost of a minor CPU increase.
Existing clusters can opt into this new compression algorithm by setting
`storage.sstable.compression_algorithm` to `zstd`. Existing SSTables
compressed with the snappy compression algorithm will NOT be actively
re-written. Instead, as SSTables are re-written over time with the new
compression algorithm as existing SSTables are compacted.
nicktrav added a commit to nicktrav/cockroach that referenced this issue May 15, 2024
Currently, there exists a single cluster setting,
`storage.sstable.compression_algorithm`, that controls the compression
algorithm used by the cluster when writing SSTables. This cluster
setting currently defaults to `snappy`.

There are various end destinations and use cases for SSTables - those
that reside in a Pebble store (the most common), those sent over the
wire via `AddSSTable`, and those generated as part of a backup for
storage in S3/GCS. Each of these destinations and use cases benefit from
being able to independently alter the compression algorithm.

In addition to the existing cluster setting, add two backup-specific
cluster settings:

- `storage.sstable.compression_algorithm_backup_storage`: applies to
  SSTs generated as part of a backup, where SSTs will reside in blob
  storage. These require compression, but not at the cost of additional
  CPU at generation time. Defaults to `snappy`, the existing algorithm
  used.
- `storage.sstable.compression_algorithm_backup_transport`: applies to
  SSTs generated as part of a backup that are sent, immediately
  iterated, and then discarded (i.e. they are never persisted). Such
  SSTs typically have larger block sizes and benefit from compression.
  Defaults to `snappy`, the existing algorithm used.

While this change introduces new cluster settings that allows
independently altering the compression algorithm for different types of
SSTs, the existing compression behavior is unchanged.

Touches cockroachdb#123953.

Release note (general change): Adds two new cluster settings,
`storage.sstable.compression_algorithm_backup_storage` and
`storage.sstable.compression_algorithm_backup_transport`, that in
addition to the existing cluster setting
`storage.sstable.compression_algorithm`, can be used to alter the
compression algorithm used for various types of SSTs.
nicktrav added a commit to nicktrav/cockroach that referenced this issue May 15, 2024
In cockroachdb#120784, we now allow for the `zstd` compression algorithm to be used
as the codec for SSTables. Initial testing has shown this algorithm to
provide a better compression ratio than snappy, at the cost of a minor
increase in CPU. This unlocks benefits such as an improved node density,
as more data can reside in each store.

Alter the default SSTable compression algorithm to `zstd`. Note that
this change only applies to SSTs written directly into a local Pebble
store, or generated to send over the wire for ingestion into a remote
store (e.g. `AddSSTable`). The defaults for backup-related SSTs are left
unchanged.

Closes cockroachdb#123953.

Release note (performance improvement): The default SSTable compression
algorithm was changed from snappy to zstd. The latter has been shown to
improve a store's compression ratio at the cost of a minor CPU increase.
Existing clusters can opt into this new compression algorithm by setting
`storage.sstable.compression_algorithm` to `zstd`. Existing SSTables
compressed with the snappy compression algorithm will NOT be actively
re-written. Instead, as SSTables are re-written over time with the new
compression algorithm as existing SSTables are compacted.
nicktrav added a commit to nicktrav/cockroach that referenced this issue May 18, 2024
Fix an issue where the compression cluster setting is being set on a
copy of the per-level configuration, rather than the configuration that
is ultimately passed to Pebble.

Touches cockroachdb#123953.

Release note: None.
craig bot pushed a commit that referenced this issue May 21, 2024
124388: storage: fix setting of compression algorithm r=aadityasondhi a=nicktrav

Fix an issue where the compression cluster setting is being set on a copy of the per-level configuration, rather than the configuration that is ultimately passed to Pebble.

Touches #123953.

Release note: None.

Epic: CRDB-37583

Co-authored-by: Nick Travers <travers@cockroachlabs.com>
craig bot pushed a commit that referenced this issue May 21, 2024
124388: storage: fix setting of compression algorithm r=RaduBerinde a=nicktrav

Fix an issue where the compression cluster setting is being set on a copy of the per-level configuration, rather than the configuration that is ultimately passed to Pebble.

Touches #123953.

Release note: None.

Epic: CRDB-37583

Co-authored-by: Nick Travers <travers@cockroachlabs.com>
blathers-crl bot pushed a commit that referenced this issue May 21, 2024
Fix an issue where the compression cluster setting is being set on a
copy of the per-level configuration, rather than the configuration that
is ultimately passed to Pebble.

Touches #123953.

Release note: None.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-storage Relating to our storage engine (Pebble) on-disk storage. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-storage Storage Team
Projects
Storage
  
In Progress (this milestone)
1 participant