Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split ColumnOp into one with row indices and one with FieldName & other enabled changes #1207

Open
wants to merge 12 commits into
base: centril/index-join-inner-colid
Choose a base branch
from

Conversation

Centril
Copy link
Contributor

@Centril Centril commented May 6, 2024

Description of Changes

The main goal of this PR

This PR has some main goals, while other changes are less important and are done as additional work (see commit descriptions). I recommend reviewing this PR commit by commit after having reviewed #1166 upon which this PR is based.

In this PR, we want to:

  1. Let build_query and query execution work on a ColumnOp that stores ColId rather than FieldName (23e00d2). The previous version of ColumnOp is now called FieldOp and uses FieldName as before. This type is used by SQL compilation and query planning and is eventually compiled down to a ColumnOp. This means that Header is no longer used by ColumnOp nor build_query.
  2. Making build_query itself (20340c8) and query execution infallible (3415702 and 986d80e). To make this possible, we must ensure that we only get bools where anything else does not make sense (977b5bb).
  3. Split IndexSemiJoin into left and right versions (e0c12ce and 3968f4f). This enables us to make both versions less branchy and to significantly shrink the size of the right case (312 bytes vs 64).

Benchmarks

Benchmarks relative to based master on i7-7700K, 64GB RAM:

Benchmarking full-join: Collecting 100 samples in estimated 5.
full-join               time:   [270.49 µs 271.55 µs 273.56 µs]
                        change: [-11.546% -9.6360% -8.2237%] (p = 0.00 < 0.05)
                        Performance has improved.

Benchmarking incr-select: Collecting 100 samples in estimated 
incr-select             time:   [151.49 ns 152.82 ns 154.70 ns]
                        change: [-22.532% -22.144% -21.611%] (p = 0.00 < 0.05)
                        Performance has improved.

Benchmarking incr-join: Collecting 100 samples in estimated 5.
incr-join               time:   [684.71 ns 687.90 ns 692.34 ns]
                        change: [-16.301% -15.580% -14.816%] (p = 0.00 < 0.05)
                        Performance has improved.

Benchmarking query-indexes-multi: Collecting 100 samples in es
query-indexes-multi     time:   [591.14 ns 593.30 ns 596.36 ns]
                        change: [-10.874% -8.5941% -6.0559%] (p = 0.00 < 0.05)
                        Performance has improved.

API and ABI breaking changes

None

Expected complexity level and risk

2, large diff in terms of query planning and execution, but each change isn't as huge.

Testing

Existing tests have been tweaked to fit the changes.

@Centril Centril force-pushed the centril/columnop-with-colid branch from 2ca64d3 to 754654c Compare May 6, 2024 16:50
@Centril Centril marked this pull request as ready for review May 6, 2024 16:50
@Centril Centril force-pushed the centril/columnop-with-colid branch 2 times, most recently from 27c19aa to e4f8dd9 Compare May 7, 2024 21:11
@Centril Centril changed the base branch from master to centril/index-join-inner-colid May 7, 2024 21:11
@Centril Centril force-pushed the centril/columnop-with-colid branch 5 times, most recently from 340516e to 7aaa0ef Compare May 8, 2024 13:33
@Centril Centril changed the title [WIP] Split ColumnOp into one with row indices and one with FieldName Split ColumnOp into one with row indices and one with FieldName & other enabled changes May 8, 2024
@cloutiertyler
Copy link
Contributor

benchmarks please

Copy link

github-actions bot commented May 10, 2024

Criterion benchmark results

Criterion benchmark report

YOU SHOULD PROBABLY IGNORE THESE RESULTS.

Criterion is a wall time based benchmarking system that is extremely noisy when run on CI. We collect these results for longitudinal analysis, but they are not reliable for comparing individual PRs.

Go look at the callgrind report instead.

empty

db on disk new latency old latency new throughput old throughput
sqlite 💿 - 450.0±1.45ns - -
sqlite 🧠 - 447.7±1.55ns - -
stdb_raw 💿 714.5±0.73ns 745.7±1.66ns - -
stdb_raw 🧠 685.6±1.05ns 720.6±0.55ns - -

insert_1

db on disk schema indices preload new latency old latency new throughput old throughput

insert_bulk

db on disk schema indices preload count new latency old latency new throughput old throughput
sqlite 💿 u32_u64_str btree_each_column 2048 256 - 520.7±0.65µs - 1920 tx/sec
sqlite 💿 u32_u64_str unique_0 2048 256 - 137.0±0.41µs - 7.1 Ktx/sec
sqlite 💿 u32_u64_u64 btree_each_column 2048 256 - 427.5±0.93µs - 2.3 Ktx/sec
sqlite 💿 u32_u64_u64 unique_0 2048 256 - 127.5±0.40µs - 7.7 Ktx/sec
sqlite 🧠 u32_u64_str btree_each_column 2048 256 - 451.6±0.67µs - 2.2 Ktx/sec
sqlite 🧠 u32_u64_str unique_0 2048 256 - 123.9±0.92µs - 7.9 Ktx/sec
sqlite 🧠 u32_u64_u64 btree_each_column 2048 256 - 372.0±0.58µs - 2.6 Ktx/sec
sqlite 🧠 u32_u64_u64 unique_0 2048 256 - 108.5±0.51µs - 9.0 Ktx/sec
stdb_raw 💿 u32_u64_str btree_each_column 2048 256 521.6±14.03µs 511.6±13.12µs 1917 tx/sec 1954 tx/sec
stdb_raw 💿 u32_u64_str unique_0 2048 256 515.6±22.78µs 424.9±16.16µs 1939 tx/sec 2.3 Ktx/sec
stdb_raw 💿 u32_u64_u64 btree_each_column 2048 256 396.9±13.49µs 358.8±6.84µs 2.5 Ktx/sec 2.7 Ktx/sec
stdb_raw 💿 u32_u64_u64 unique_0 2048 256 372.2±10.01µs 324.6±9.96µs 2.6 Ktx/sec 3.0 Ktx/sec
stdb_raw 🧠 u32_u64_str btree_each_column 2048 256 341.7±0.24µs 338.4±0.46µs 2.9 Ktx/sec 2.9 Ktx/sec
stdb_raw 🧠 u32_u64_str unique_0 2048 256 262.8±0.19µs 267.1±0.40µs 3.7 Ktx/sec 3.7 Ktx/sec
stdb_raw 🧠 u32_u64_u64 btree_each_column 2048 256 274.1±0.26µs 277.2±0.15µs 3.6 Ktx/sec 3.5 Ktx/sec
stdb_raw 🧠 u32_u64_u64 unique_0 2048 256 246.3±0.22µs 247.0±0.08µs 4.0 Ktx/sec 4.0 Ktx/sec

iterate

db on disk schema indices new latency old latency new throughput old throughput
sqlite 💿 u32_u64_str unique_0 - 21.8±0.12µs - 44.8 Ktx/sec
sqlite 💿 u32_u64_u64 unique_0 - 20.0±0.11µs - 48.8 Ktx/sec
sqlite 🧠 u32_u64_str unique_0 - 20.5±0.06µs - 47.7 Ktx/sec
sqlite 🧠 u32_u64_u64 unique_0 - 18.6±0.10µs - 52.4 Ktx/sec
stdb_raw 💿 u32_u64_str unique_0 4.7±0.00µs 4.7±0.00µs 208.4 Ktx/sec 207.8 Ktx/sec
stdb_raw 💿 u32_u64_u64 unique_0 4.6±0.00µs 4.6±0.00µs 214.2 Ktx/sec 212.8 Ktx/sec
stdb_raw 🧠 u32_u64_str unique_0 4.7±0.00µs 4.7±0.00µs 209.7 Ktx/sec 209.0 Ktx/sec
stdb_raw 🧠 u32_u64_u64 unique_0 4.5±0.00µs 4.6±0.00µs 215.5 Ktx/sec 214.1 Ktx/sec

find_unique

db on disk key type preload new latency old latency new throughput old throughput

filter

db on disk key type index strategy load count new latency old latency new throughput old throughput
sqlite 💿 string index 2048 256 - 66.4±0.15µs - 14.7 Ktx/sec
sqlite 💿 u64 index 2048 256 - 64.2±0.31µs - 15.2 Ktx/sec
sqlite 🧠 string index 2048 256 - 64.6±0.06µs - 15.1 Ktx/sec
sqlite 🧠 u64 index 2048 256 - 59.6±0.15µs - 16.4 Ktx/sec
stdb_raw 💿 string index 2048 256 5.1±0.00µs 5.2±0.00µs 190.2 Ktx/sec 189.4 Ktx/sec
stdb_raw 💿 u64 index 2048 256 5.1±0.00µs 5.1±0.00µs 193.2 Ktx/sec 192.1 Ktx/sec
stdb_raw 🧠 string index 2048 256 5.1±0.00µs 5.1±0.00µs 191.3 Ktx/sec 190.3 Ktx/sec
stdb_raw 🧠 u64 index 2048 256 5.0±0.00µs 5.1±0.00µs 194.3 Ktx/sec 193.2 Ktx/sec

serialize

schema format count new latency old latency new throughput old throughput
u32_u64_str bflatn_to_bsatn_fast_path 100 3.7±0.08µs 3.7±0.01µs 25.6 Mtx/sec 25.8 Mtx/sec
u32_u64_str bflatn_to_bsatn_slow_path 100 3.5±0.02µs 3.4±0.01µs 27.5 Mtx/sec 27.7 Mtx/sec
u32_u64_str bsatn 100 2.4±0.02µs 2.5±0.01µs 39.2 Mtx/sec 37.6 Mtx/sec
u32_u64_str json 100 5.1±0.20µs 4.7±0.06µs 18.6 Mtx/sec 20.2 Mtx/sec
u32_u64_str product_value 100 1013.2±0.40ns 1015.0±0.79ns 94.1 Mtx/sec 94.0 Mtx/sec
u32_u64_u64 bflatn_to_bsatn_fast_path 100 1455.4±39.41ns 1399.1±5.71ns 65.5 Mtx/sec 68.2 Mtx/sec
u32_u64_u64 bflatn_to_bsatn_slow_path 100 2.9±0.01µs 2.9±0.01µs 33.2 Mtx/sec 33.5 Mtx/sec
u32_u64_u64 bsatn 100 1693.2±20.30ns 1655.7±15.71ns 56.3 Mtx/sec 57.6 Mtx/sec
u32_u64_u64 json 100 3.2±0.02µs 3.1±0.03µs 30.0 Mtx/sec 30.8 Mtx/sec
u32_u64_u64 product_value 100 1009.7±2.75ns 1009.8±0.43ns 94.4 Mtx/sec 94.4 Mtx/sec
u64_u64_u32 bflatn_to_bsatn_fast_path 100 1081.2±3.84ns 1123.8±1.19ns 88.2 Mtx/sec 84.9 Mtx/sec
u64_u64_u32 bflatn_to_bsatn_slow_path 100 2.9±0.01µs 2.9±0.00µs 33.2 Mtx/sec 33.4 Mtx/sec
u64_u64_u32 bsatn 100 1686.0±37.85ns 1704.2±27.56ns 56.6 Mtx/sec 56.0 Mtx/sec
u64_u64_u32 json 100 3.2±0.02µs 3.3±0.04µs 29.6 Mtx/sec 29.2 Mtx/sec
u64_u64_u32 product_value 100 1010.1±0.73ns 1010.2±0.46ns 94.4 Mtx/sec 94.4 Mtx/sec

stdb_module_large_arguments

arg size new latency old latency new throughput old throughput
64KiB 88.6±13.92µs 91.7±4.61µs - -

stdb_module_print_bulk

line count new latency old latency new throughput old throughput
1 47.1±6.85µs 38.8±5.38µs - -
100 357.4±5.03µs 349.3±57.97µs - -
1000 2.3±0.53ms 2.3±0.48ms - -

remaining

name new latency old latency new throughput old throughput
sqlite/💿/update_bulk/u32_u64_str/unique_0/load=2048/count=256 - 47.8±0.19µs - 20.4 Ktx/sec
sqlite/💿/update_bulk/u32_u64_u64/unique_0/load=2048/count=256 - 41.8±0.24µs - 23.4 Ktx/sec
sqlite/🧠/update_bulk/u32_u64_str/unique_0/load=2048/count=256 - 40.7±0.05µs - 24.0 Ktx/sec
sqlite/🧠/update_bulk/u32_u64_u64/unique_0/load=2048/count=256 - 36.2±0.28µs - 27.0 Ktx/sec
stdb_module/💿/update_bulk/u32_u64_str/unique_0/load=2048/count=256 1379.8±7.34µs 1442.4±12.66µs 724 tx/sec 693 tx/sec
stdb_module/💿/update_bulk/u32_u64_u64/unique_0/load=2048/count=256 1092.4±12.21µs 1120.2±4.56µs 915 tx/sec 892 tx/sec
stdb_raw/💿/update_bulk/u32_u64_str/unique_0/load=2048/count=256 610.8±13.43µs 691.4±50.77µs 1637 tx/sec 1446 tx/sec
stdb_raw/💿/update_bulk/u32_u64_u64/unique_0/load=2048/count=256 551.1±9.34µs 542.0±20.72µs 1814 tx/sec 1845 tx/sec
stdb_raw/🧠/update_bulk/u32_u64_str/unique_0/load=2048/count=256 445.6±0.52µs 448.2±0.28µs 2.2 Ktx/sec 2.2 Ktx/sec
stdb_raw/🧠/update_bulk/u32_u64_u64/unique_0/load=2048/count=256 403.4±0.38µs 403.4±0.33µs 2.4 Ktx/sec 2.4 Ktx/sec

Copy link

github-actions bot commented May 10, 2024

Callgrind benchmark results

Callgrind Benchmark Report

These benchmarks were run using callgrind,
an instruction-level profiler. They allow comparisons between sqlite (sqlite), SpacetimeDB running through a module (stdb_module), and the underlying SpacetimeDB data storage engine (stdb_raw). Callgrind emulates a CPU to collect the below estimates.

Measurement changes larger than five percent are in bold.

In-memory benchmarks

callgrind: empty transaction

db total reads + writes old total reads + writes Δrw estimated cycles old estimated cycles Δcycles
stdb_raw 6004 5999 0.08% 6856 6869 -0.19%
sqlite 5676 5686 -0.18% 6230 6150 1.30%

callgrind: filter

db schema indices count preload _column data_type total reads + writes old total reads + writes Δrw estimated cycles old estimated cycles Δcycles
stdb_raw u32_u64_str no_index 64 128 2 string 120719 120710 0.01% 121493 121332 0.13%
stdb_raw u32_u64_str no_index 64 128 1 u64 78453 78444 0.01% 78979 78946 0.04%
stdb_raw u32_u64_str btree_each_column 64 128 2 string 25181 25176 0.02% 25675 25666 0.04%
stdb_raw u32_u64_str btree_each_column 64 128 1 u64 24141 24136 0.02% 24483 24522 -0.16%
sqlite u32_u64_str no_index 64 128 2 string 143658 143664 -0.00% 145268 145362 -0.06%
sqlite u32_u64_str no_index 64 128 1 u64 122999 123005 -0.00% 124309 124379 -0.06%
sqlite u32_u64_str btree_each_column 64 128 1 u64 130316 130316 0.00% 131892 131814 0.06%
sqlite u32_u64_str btree_each_column 64 128 2 string 133534 133521 0.01% 135294 135207 0.06%

callgrind: insert bulk

db schema indices count preload total reads + writes old total reads + writes Δrw estimated cycles old estimated cycles Δcycles
stdb_raw u32_u64_str unique_0 64 128 943910 948257 -0.46% 963432 966417 -0.31%
stdb_raw u32_u64_str btree_each_column 64 128 1072824 1078622 -0.54% 1099084 1106696 -0.69%
sqlite u32_u64_str unique_0 64 128 398413 398423 -0.00% 418049 414803 0.78%
sqlite u32_u64_str btree_each_column 64 128 971486 971492 -0.00% 1012186 1008818 0.33%

callgrind: iterate

db schema indices count total reads + writes old total reads + writes Δrw estimated cycles old estimated cycles Δcycles
stdb_raw u32_u64_str unique_0 1024 147937 147932 0.00% 147975 148036 -0.04%
stdb_raw u32_u64_str unique_0 64 15760 15746 0.09% 15798 15850 -0.33%
sqlite u32_u64_str unique_0 1024 1046910 1046911 -0.00% 1050362 1050381 -0.00%
sqlite u32_u64_str unique_0 64 75041 75051 -0.01% 76213 76149 0.08%

callgrind: serialize_product_value

count format total reads + writes old total reads + writes Δrw estimated cycles old estimated cycles Δcycles
64 bsatn 25717 25717 0.00% 27961 28063 -0.36%
64 json 47438 47438 0.00% 50022 50022 0.00%
16 bsatn 8118 8118 0.00% 9410 9512 -1.07%
16 json 12142 12142 0.00% 13978 13978 0.00%

callgrind: update bulk

db schema indices count preload total reads + writes old total reads + writes Δrw estimated cycles old estimated cycles Δcycles
stdb_raw u32_u64_str unique_0 1024 1024 22432484 22471962 -0.18% 22993300 23078216 -0.37%
stdb_raw u32_u64_str unique_0 64 128 1422971 1425382 -0.17% 1494659 1500696 -0.40%
sqlite u32_u64_str unique_0 1024 1024 1802006 1802012 -0.00% 1811302 1811094 0.01%
sqlite u32_u64_str unique_0 64 128 128542 128548 -0.00% 131494 131420 0.06%
On-disk benchmarks

callgrind: empty transaction

db total reads + writes old total reads + writes Δrw estimated cycles old estimated cycles Δcycles
stdb_raw 6364 6365 -0.02% 7208 7259 -0.70%
sqlite 5718 5718 0.00% 6298 6208 1.45%

callgrind: filter

db schema indices count preload _column data_type total reads + writes old total reads + writes Δrw estimated cycles old estimated cycles Δcycles
stdb_raw u32_u64_str no_index 64 128 2 string 121079 121076 0.00% 121933 121922 0.01%
stdb_raw u32_u64_str no_index 64 128 1 u64 78813 78810 0.00% 79331 79376 -0.06%
stdb_raw u32_u64_str btree_each_column 64 128 1 u64 24501 24502 -0.00% 24867 25052 -0.74%
stdb_raw u32_u64_str btree_each_column 64 128 2 string 25738 25540 0.78% 26320 26118 0.77%
sqlite u32_u64_str no_index 64 128 2 string 145579 145579 0.00% 147537 147369 0.11%
sqlite u32_u64_str no_index 64 128 1 u64 124935 124920 0.01% 126637 126502 0.11%
sqlite u32_u64_str btree_each_column 64 128 2 string 135571 135587 -0.01% 137649 137571 0.06%
sqlite u32_u64_str btree_each_column 64 128 1 u64 132412 132412 0.00% 134254 134284 -0.02%

callgrind: insert bulk

db schema indices count preload total reads + writes old total reads + writes Δrw estimated cycles old estimated cycles Δcycles
stdb_raw u32_u64_str unique_0 64 128 895678 899093 -0.38% 945732 947465 -0.18%
stdb_raw u32_u64_str btree_each_column 64 128 1023147 1026673 -0.34% 1079977 1083013 -0.28%
sqlite u32_u64_str unique_0 64 128 415961 415961 0.00% 434619 431889 0.63%
sqlite u32_u64_str btree_each_column 64 128 1022061 1022071 -0.00% 1061043 1059193 0.17%

callgrind: iterate

db schema indices count total reads + writes old total reads + writes Δrw estimated cycles old estimated cycles Δcycles
stdb_raw u32_u64_str unique_0 1024 148297 148298 -0.00% 148355 148490 -0.09%
stdb_raw u32_u64_str unique_0 64 16120 16112 0.05% 16178 16304 -0.77%
sqlite u32_u64_str unique_0 1024 1049963 1049963 0.00% 1053699 1053685 0.00%
sqlite u32_u64_str unique_0 64 76813 76813 0.00% 78137 78103 0.04%

callgrind: serialize_product_value

count format total reads + writes old total reads + writes Δrw estimated cycles old estimated cycles Δcycles
64 bsatn 25717 25717 0.00% 27961 28063 -0.36%
64 json 47438 47438 0.00% 50022 50022 0.00%
16 bsatn 8118 8118 0.00% 9410 9512 -1.07%
16 json 12142 12142 0.00% 13978 13978 0.00%

callgrind: update bulk

db schema indices count preload total reads + writes old total reads + writes Δrw estimated cycles old estimated cycles Δcycles
stdb_raw u32_u64_str unique_0 1024 1024 21390132 21423310 -0.15% 22027740 22112670 -0.38%
stdb_raw u32_u64_str unique_0 64 128 1378537 1380648 -0.15% 1448577 1454450 -0.40%
sqlite u32_u64_str unique_0 1024 1024 1809802 1809802 0.00% 1818486 1818450 0.00%
sqlite u32_u64_str unique_0 64 128 132690 132690 0.00% 135806 135576 0.17%

@Centril Centril force-pushed the centril/index-join-inner-colid branch from 8ca55bf to a53bb88 Compare May 24, 2024 15:37
@Centril Centril force-pushed the centril/columnop-with-colid branch 2 times, most recently from 74f913e to 13eca30 Compare May 28, 2024 07:07
@Centril Centril force-pushed the centril/index-join-inner-colid branch from a53bb88 to 72b74e4 Compare May 30, 2024 23:48
Centril added 12 commits May 31, 2024 01:49
2. Shrink SqlAst to 80 bytes, so it can be passed in registers
3. Store end-result Header in IndexSemiJoin
4. Remove operational use of Header in ColumnOp & build_query
5. Simplify RowRef::{get, project, project_owned}
1. Make IndexSemiJoin::filter infallible.
2. Make ColumnOp::compare and friends infallible.
3. Make RowRef::{get, project, project_owned} infallible.
2. Document RelValue::{get, read_or_take_column, project_owned}
3. Refactor optimize_select
4. Ensure in optimize_select that conditions are merged with preceding selects
@Centril Centril force-pushed the centril/columnop-with-colid branch from 13eca30 to 9227167 Compare May 30, 2024 23:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-any To be landed in any release window
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants