Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(core): row-first WAL segment format #4440

Draft
wants to merge 53 commits into
base: master
Choose a base branch
from

Conversation

puzpuzpuz
Copy link
Contributor

@puzpuzpuz puzpuzpuz commented Apr 24, 2024

Depends on #4413 - need to introduce column conversion for row-first format once 4413 lands

Currently, we use column-oriented format for WAL segments. This means that WAL writers write into individual column files. With row-based format, they'll be writing row-wise into a single file. WAL apply job can support both formats: the difference will be in the way how it lifts WAL segments into buffers, further logic will be the same for both formats. The advantages of row-oriented format are the following:

  • Less random disk write pattern.
  • It's closer to ILP / SQL INSERTs which are also row-first.

Column-oriented WAL format is kept for efficient handling of CREATE TABLE x AS SELECT * FROM y; and similar column-first scenarios (not implemented yet).

WAL apply job reads the transaction data from the segment.d and writes column values into the O3 buffers. The later data processing is the same as for the column-first format.

Full design doc: https://questdb.slab.com/posts/new-row-first-wal-segment-format-4tvijkka

Implementation details

Column-oriented WalWriter is renamed to WalColFirstWriter while a new WalRowFirstWriter class is introduced. For now, this class is used by default everywhere. This is controlled with the cairo.wal.default.format property (supported values: row and column).

Instead of the current structure, the WalRowFirstWriter will write the following new files as contents of the db/{tableDirName~NN}/wal{walId}/{segmentId}/ paths.

  • segment.d (row-first data)
  • new wal event, id = 3 (ROW_FIRST_DATA) - includes start and end offsets in the segment.d file

and no other files in the WAL segment directory.

segment.d file format:

[
    [
        column_id: int,
        column_data: var-size
    ],
    row_delimiter: -1 (int),
]

Notes:

  • [descr] are repeated sections.
  • name: type indicates a field.
  • The row_delimiter field stands for the row suffix field separating rows from each other.

TSBS benchmark

Dev machine

The following TSBS run covers a worst-case scenario for the new format. That's because it only uses a single table with a small number of columns (20), so the new format should have no advantage here. The goal is to make sure that there is no regression in throughput or memory usage.

Data was generated with the following command:

$ ./tsbs_generate_data --use-case="cpu-only" --seed=123 --scale=4000 --timestamp-start="2016-01-01T00:00:00Z" --timestamp-end="2016-01-07T00:00:00Z" --log-interval="10s" --format="questdb" > /tmp/data

Ingestion was run with the following command:

$ ./tsbs_load_questdb --file /tmp/data --workers 3

Row-first format

RSS 600MB max

WAL apply job rows/s stats:

  • Min: 122037 rows/s
  • Max: 2080109 rows/s
  • Avg: 1337918 rows/s
$ ./tsbs_load_questdb --file /tmp/data --workers 3
time,per. metric/s,metric total,overall metric/s,per. row/s,row total,overall row/s
1715092247,13409556.58,1.341000E+08,13409556.58,1340955.66,1.341000E+07,1340955.66
...
1715092457,8909120.50,2.060300E+09,9364941.62,890912.05,2.060300E+08,936494.16

Summary:
loaded 2073600000 metrics in 221.488sec with 3 workers (mean rate 9362141.07 metrics/sec)
loaded 207360000 rows in 221.488sec with 3 workers (mean rate 936214.11 rows/sec)

Column-first format

RSS 545MB max

WAL apply job rows/s stats:

  • Min: 28920 rows/s
  • Max: 9516623 rows/s
  • Avg: 2675001 rows/s
$ ./tsbs_load_questdb --file /tmp/data --workers 3
time,per. metric/s,metric total,overall metric/s,per. row/s,row total,overall row/s
1715095341,13939098.00,1.394000E+08,13939098.00,1393909.80,1.394000E+07,1393909.80
...
1715095541,9559899.45,2.030300E+09,9668087.70,955989.94,2.030300E+08,966808.77

Summary:
loaded 2073600000 metrics in 214.749sec with 3 workers (mean rate 9655942.13 metrics/sec)
loaded 207360000 rows in 214.749sec with 3 workers (mean rate 965594.21 rows/sec)

EC2

c6a.4xlarge + 1TB gp2

Row-first format

WAL apply job rows/s stats:

  • Min: 34507 rows/s
  • Max: 2293584 rows/s
  • Avg: 992801 rows/s
$ ./tsbs_load_questdb --file /tmp/data --workers 5
time,per. metric/s,metric total,overall metric/s,per. row/s,row total,overall row/s
1715153327,7227705.29,7.230000E+07,7227705.29,722770.53,7.230000E+06,722770.53
...
1715154387,4810019.59,4.485700E+09,4192236.00,481001.96,4.485700E+08,419223.60

Summary:
loaded 4492800000 metrics in 1071.604sec with 5 workers (mean rate 4192594.35 metrics/sec)
loaded 449280000 rows in 1071.604sec with 5 workers (mean rate 419259.43 rows/sec)

Column-first format

WAL apply job rows/s stats:

  • Min: 26855 rows/s
  • Max: 7258300 rows/s
  • Avg: 1606634 rows/s
$ ./tsbs_load_questdb --file /tmp/data --workers 5
time,per. metric/s,metric total,overall metric/s,per. row/s,row total,overall row/s
1715154626,7069935.15,7.070000E+07,7069935.15,706993.51,7.070000E+06,706993.51
...
1715155626,3639973.41,4.492100E+09,4447621.70,363997.34,4.492100E+08,444762.17

Summary:
loaded 4492800000 metrics in 1010.150sec with 5 workers (mean rate 4447658.45 metrics/sec)
loaded 449280000 rows in 1010.150sec with 5 workers (mean rate 444765.84 rows/sec)

1K column table benchmark

The scenario assumes a table with 1K long columns. The loader application may be found here.

Test environment: c6a.4xlarge + 1TB gp2

Row-first format

RSS 1.2GB max

WAL apply job rows/s stats:

  • Min: 156 rows/s
  • Max: 19951 rows/s
  • Avg: 4067 rows/s

Ingestion: 16K rows/s on average

Column-first format

RSS 4.6GB max

WAL apply job rows/s stats:

  • Min: 136 rows/s
  • Max: 27303 rows/s
  • Avg: 7208 rows/s

Ingestion: 20K rows/s on average

@puzpuzpuz puzpuzpuz added Enhancement Enhance existing functionality Core Related to storage, data type, etc. labels Apr 24, 2024
@puzpuzpuz puzpuzpuz self-assigned this Apr 24, 2024

import static io.questdb.cairo.ColumnType.LEGACY_VAR_SIZE_AUX_SHL;

public class StringTypeDriver implements ColumnTypeDriver {
public static final StringTypeDriver INSTANCE = new StringTypeDriver();

public static long getPlainValueByteCount(@Nullable CharSequence value) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

long return values is confusing and inconsistent. It should be int

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed long to int in VarcharTypeDriver in 8ef6c94

For a string of Integer.MAX_VALUE length, the bytes count will be 4294967298 which doesn't fit into an int. So, we should probably keep long type for the StringTypeDriver. WDYT?

@ideoma
Copy link
Collaborator

ideoma commented May 8, 2024

[PR Coverage check]

😍 pass : 1341 / 1489 (90.06%)

file detail

path covered line new line coverage
🔵 io/questdb/cairo/TableWriterMetadata.java 0 1 00.00%
🔵 io/questdb/cairo/WalFormat.java 0 5 00.00%
🔵 io/questdb/PropServerConfiguration.java 8 9 88.89%
🔵 io/questdb/cairo/wal/WalRowFirstWriter.java 775 880 88.07%
🔵 io/questdb/cairo/wal/WalColFirstWriter.java 19 21 90.48%
🔵 io/questdb/cairo/TableWriter.java 228 252 90.48%
🔵 io/questdb/cairo/pool/WalRowFirstWriterPool.java 35 38 92.11%
🔵 io/questdb/cairo/wal/WalEventCursor.java 28 30 93.33%
🔵 io/questdb/cairo/wal/CopyWalSegmentUtils.java 111 116 95.69%
🔵 io/questdb/cairo/wal/WalTxnDetails.java 6 6 100.00%
🔵 io/questdb/cairo/wal/WalEventWriter.java 17 17 100.00%
🔵 io/questdb/cairo/wal/WalWriterMetadata.java 5 5 100.00%
🔵 io/questdb/cairo/vm/MemoryCMORImpl.java 2 2 100.00%
🔵 io/questdb/cairo/pool/WalColFirstWriterPool.java 4 4 100.00%
🔵 io/questdb/cairo/pool/PoolListener.java 1 1 100.00%
🔵 io/questdb/std/BitSet.java 2 2 100.00%
🔵 io/questdb/cairo/RecordChain.java 1 1 100.00%
🔵 io/questdb/cairo/ColumnType.java 3 3 100.00%
🔵 io/questdb/cairo/vm/api/MemoryCR.java 2 2 100.00%
🔵 io/questdb/cairo/DefaultCairoConfiguration.java 1 1 100.00%
🔵 io/questdb/cairo/VarcharTypeDriver.java 1 1 100.00%
🔵 io/questdb/cairo/vm/api/MemoryCMOR.java 1 1 100.00%
🔵 io/questdb/cairo/CairoEngine.java 29 29 100.00%
🔵 io/questdb/PropertyKey.java 1 1 100.00%
🔵 io/questdb/cairo/wal/ApplyWal2TableJob.java 14 14 100.00%
🔵 io/questdb/std/LongList.java 2 2 100.00%
🔵 io/questdb/cairo/map/OrderedMap.java 1 1 100.00%
🔵 io/questdb/cairo/vm/MemoryCARWImpl.java 7 7 100.00%
🔵 io/questdb/cairo/BinaryTypeDriver.java 1 1 100.00%
🔵 io/questdb/cairo/vm/AbstractMemoryCARW.java 18 18 100.00%
🔵 io/questdb/cairo/vm/MemoryCMARWImpl.java 4 4 100.00%
🔵 io/questdb/cairo/CairoConfigurationWrapper.java 1 1 100.00%
🔵 io/questdb/cairo/StringTypeDriver.java 2 2 100.00%
🔵 io/questdb/cairo/vm/AbstractMemoryCR.java 11 11 100.00%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Core Related to storage, data type, etc. Enhancement Enhance existing functionality ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants