New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Native] CTE support in Prestissimo #22630
Comments
Notes : Presto CTE design is based on https://www.vldb.org/pvldb/vol8/p1704-elhelw.pdf The main PR Is #20887 The core of the logic to wire CT Producers and Consumers and Sequence nodes for CTEs is in the logical optimizer. This is all translated to Temp tables writes and reads in the physical planning. TODO : Cover the gaps post physical planning. |
The first issue I have encountered is that CTE generates bucketed (but not partitioned) TEMP tables as supported by HMS. These are not supported in Prestissimo presto:tpch> WITH temp as (SELECT orderkey FROM ORDERS) SELECT * FROM temp t1;
Will review this support in Presto Native. |
Identified 3 sub-parts of coding for this feature: |
I believe in Java we use the |
@tdcmeehan : Yes, Java uses PRESTO_PAGE format. For Velox, Arrow or just the format used for spilling should be efficient. I'll prototype the speed-ups seen with both. |
@jaystarshot : facebookincubator/velox#9844 and #22780 In the Presto PR I derived a Native test from TestCteExecution.java so that all your tests are run on the Native side. 31 of the tests passed but 18 failed. I'm looking at the failures in more detail. But would be great if you took a look as well. You have to apply the Velox PR changes in your Velox submodule with the Presto PR to get a working setup. |
Even if we use arrow, after the reads and before the writes we still might need to convert it to presto page format when we would exchange the table scan stage. I believe the the spilling format should be the same as Presto page. link I found this serializer in code which https://github.com/facebookincubator/velox/blob/main/velox/serializers/PrestoSerializer.cpp#L49 seems to serialize in presto Page format. |
@jaystarshot : Spilling uses PrestoSerializer. I'll create a PR that add a PrestoPageWriter that we can wire into this code. |
Thats awesome! you will need the reader as well. I was trying to look into this as a velox-beginner task for myself but I will leave it to the experts! |
Expected Behavior or Use Case
Presto java supports CTE (WITH clause) with materialization.
https://prestodb.io/docs/0.286/admin/properties.html#cte-materialization-properties
Investigate their usage in Prestissimo
Presto Component, Service, or Connector
Presto native
Possible Implementation
Example Screenshots (if appropriate):
Context
The text was updated successfully, but these errors were encountered: