perf(sql): rewrite trivial expressions over same column in GROUP BY queries #4508

nwoolmer · 2024-05-15T13:53:18Z

This relates to performance around Clickbench Q35.

For Clickbench Q35 on M2 Mac Mini, this speeds up the query from 1.7s to 0.9s.

The rewritten query runs using a Rosti implementation and an early limit, instead of async group by and a late limit.

With changes but async group by instead of rosti, it runs in 1.18s.

Query:

SELECT ClientIP, ClientIP - 1, ClientIP - 2, ClientIP - 3, COUNT(*) AS c 
FROM hits 
GROUP BY ClientIP, ClientIP - 1, ClientIP - 2, ClientIP - 3 
ORDER BY c DESC LIMIT 10;

Before change:

Sort light lo: 10
  keys: [c desc]
    VirtualRecord
      functions: [ClientIP,column,column1,column2,c]
        Async Group By workers: 8
          keys: [ClientIP,column,column1,column2]
          values: [count(*)]
          filter: null
            DataFrame
                Row forward scan
                Frame forward scan on: hits

Execute: 1.78s

After change:

VirtualRecord
  functions: [ClientIP,ClientIP-1,ClientIP-2,ClientIP-3,c]
    Sort light lo: 10
      keys: [c desc]
        GroupBy vectorized: true workers: 8
          keys: [ClientIP]          values: [count(*)]
            DataFrame
                Row forward scan
                Frame forward scan on: hits

Execute: 915.35ms

…or Q35. More refactoring tba

core/src/main/java/io/questdb/griffin/SqlOptimiser.java

nwoolmer · 2024-05-15T14:55:02Z

core/src/main/java/io/questdb/griffin/SqlOptimiser.java

+        if (model.getSelectModelType() == QueryModel.SELECT_MODEL_VIRTUAL
+                && nestedModel.getSelectModelType() == QueryModel.SELECT_MODEL_GROUP_BY) {


nit: maybe this could be safely relaxed

core/src/main/java/io/questdb/griffin/model/QueryModel.java

nwoolmer · 2024-05-15T15:43:16Z

core/src/main/java/io/questdb/griffin/SqlOptimiser.java

+        if (model.getSelectModelType() == QueryModel.SELECT_MODEL_VIRTUAL
+                && nestedModel.getSelectModelType() == QueryModel.SELECT_MODEL_GROUP_BY) {
+
+            CharSequenceIntHashMap nestedCandidates = new CharSequenceIntHashMap();


nit: is there a lighter option?

You could move this map into a field and reuse it between the invocations.

nwoolmer · 2024-05-15T15:44:17Z

core/src/test/java/io/questdb/test/griffin/SqlOptimiserTest.java

@@ -1447,6 +1447,49 @@ public void testOrderingOfSortsInSingleTimestampCase() throws Exception {
        });
    }

+    @Test
+    public void testRewriteTrivialExpressionsBasic() throws Exception {


more tests would be good

…_rewrite_trivial_expr

…this.

ideoma · 2024-05-28T10:08:55Z

[PR Coverage check]

😍 pass : 49 / 50 (98.00%)

file detail

	path	covered line	new line	coverage
🔵	io/questdb/griffin/SqlOptimiser.java	48	49	97.96%
🔵	io/questdb/griffin/model/QueryModel.java	1	1	100.00%

puzpuzpuz · 2024-05-28T10:28:27Z

We need to mitigate perf degradation on the c6a.metal box (192 cores, 384GB RAM): 0.6s for Java vs 2.2s for Rosti. The reason is that Java code implements hash table sharding while C++ code doesn't. Maybe we could mitigate this by limiting the max number of Rosti tables in use here:

questdb/core/src/main/java/io/questdb/griffin/engine/groupby/vect/GroupByRecordCursorFactory.java

Line 82 in fe2aeca

this.workerCount = workerCount;

puzpuzpuz · 2024-05-29T06:32:48Z

We need to mitigate perf degradation on the c6a.metal box (192 cores, 384GB RAM): 0.6s for Java vs 2.2s for Rosti. The reason is that Java code implements hash table sharding while C++ code doesn't. Maybe we could mitigate this by limiting the max number of Rosti tables in use here

I have a better idea: we should disable keyed Rosti for all types but SYMBOL. That's because SYMBOL type has low cardinality, while it's not always the case for INT or IPv4. So, for high cardinality columns Java-based GROUP BY will be faster than the Rosti one.

@bluestreak01 WDYT?

nwoolmer and others added 7 commits May 14, 2024 17:41

First pass at rewriting, not quite there

bec0042

All gets broken after propagateTopDownColumns

a906708

Avoid duplication of functions

f473b45

Virtualise

53d9158

Working version but non ideal

7171d72

Revised version which modifies in place, refactor and tests tba

08c4284

Add basic test. Performance improvement is 1.7s -> 900ms on M2 chip f…

77f2d12

…or Q35. More refactoring tba

nwoolmer added SQL Issues or changes relating to SQL execution Performance Performance improvements labels May 15, 2024

Merge branch 'master' into nw_rewrite_trivial_expr

d513f3f

nwoolmer commented May 15, 2024

View reviewed changes

core/src/main/java/io/questdb/griffin/SqlOptimiser.java Outdated Show resolved Hide resolved

Refactoring

f849066

nwoolmer commented May 15, 2024

View reviewed changes

core/src/main/java/io/questdb/griffin/model/QueryModel.java Show resolved Hide resolved

nwoolmer requested a review from puzpuzpuz May 15, 2024 15:40

nwoolmer and others added 2 commits May 15, 2024 16:41

Exception never thrown lint

d1602f4

Merge branch 'master' into nw_rewrite_trivial_expr

0fb5bec

nwoolmer marked this pull request as ready for review May 15, 2024 15:41

nwoolmer added the ready for review label May 15, 2024

nwoolmer commented May 15, 2024

View reviewed changes

Rename

949d270

nwoolmer commented May 15, 2024

View reviewed changes

nwoolmer and others added 6 commits May 15, 2024 16:45

Merge remote-tracking branch 'origin/nw_rewrite_trivial_expr' into nw…

10331c6

…_rewrite_trivial_expr

Merge branch 'master' into nw_rewrite_trivial_expr

29d8943

Fix renamed utility

ccbd4bb

Fix test

6eb577f

Merge branch 'master' into nw_rewrite_trivial_expr

8121e0e

Remove alias index rebuild since removeColumn has been changed to do …

4950ad8

…this.

nwoolmer added the DO NOT MERGE These changes should not be merged to main branch label May 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(sql): rewrite trivial expressions over same column in GROUP BY queries #4508

perf(sql): rewrite trivial expressions over same column in GROUP BY queries #4508

nwoolmer commented May 15, 2024 •

edited

nwoolmer May 15, 2024

nwoolmer May 15, 2024

puzpuzpuz May 28, 2024

nwoolmer May 15, 2024

ideoma commented May 28, 2024

puzpuzpuz commented May 28, 2024

puzpuzpuz commented May 29, 2024

		if (model.getSelectModelType() == QueryModel.SELECT_MODEL_VIRTUAL
		&& nestedModel.getSelectModelType() == QueryModel.SELECT_MODEL_GROUP_BY) {

perf(sql): rewrite trivial expressions over same column in GROUP BY queries #4508

Are you sure you want to change the base?

perf(sql): rewrite trivial expressions over same column in GROUP BY queries #4508

Conversation

nwoolmer commented May 15, 2024 • edited

nwoolmer May 15, 2024

Choose a reason for hiding this comment

nwoolmer May 15, 2024

Choose a reason for hiding this comment

puzpuzpuz May 28, 2024

Choose a reason for hiding this comment

nwoolmer May 15, 2024

Choose a reason for hiding this comment

ideoma commented May 28, 2024

[PR Coverage check]

file detail

puzpuzpuz commented May 28, 2024

puzpuzpuz commented May 29, 2024

nwoolmer commented May 15, 2024 •

edited