Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor converting type expressions to typerefs and ranges #7274

Merged
merged 1 commit into from May 14, 2024

Conversation

dnwpark
Copy link
Contributor

@dnwpark dnwpark commented Apr 29, 2024

Remove special intersection behaviour and resolve all type expressions to produce a non-overlapping union of types. This will be helpful when implementing type expressions.

Currently, it's really only possible to create unions or intersections of unions, so this change should not affect behaviour.

Uses of both typeref.union and typeref.intersection:

  • Creating a range for typeref
  • Checking if process_set_as_agg_expr should be serialized

Uses of typeref.union only:

  • Selecting all types for a delete statement
  • Type check operator

Typeref ranges

The current behaviour is:

  • unions are implements as union ALL
  • intersections are implemented as join on id

The proposed behaviour is:

  • unions and intersections implemented as union ALL

In theory, it should be possible to convert a type expression into an intersection of non-overlapping unions:

  • Convert type expression to CNF (should not result in any ~A expressions)
  • Expand unions into non-overlapping unions
  • Remove any redundant union members

This can then be converted into SQL which matches the CNF. However, since the underlying types are implemented as views the query will be slower than just using the types directly.

One more performance consideration is that this new implementation will get all ancestors and children of a typeref. This may have performance impact on the compiler side.

Effect of changes

Non-overlapping union select Ba is CBa | Bb

No changes:

    WITH 
    "t_schema::ObjectType~1" AS NOT MATERIALIZED 
        ((
            FROM LATERAL (
                FROM edgedbstd."schema::ObjectType_t" AS "ObjectType~2"
                CROSS JOIN LATERAL (
                    FROM LATERAL (
                        SELECT/*<pg.SelectStmt at 0x7fccb2558a40>*/
                            (NOT "ObjectType~2"."internal") AS "expr~1_value~1"
                    ) AS "expr-1~2"
                    CROSS JOIN (SELECT (NULL)::uuid AS "v~2") AS "v~1"
                    CROSS JOIN LATERAL (
                        SELECT/*<pg.SelectStmt at 0x7fccb2559640>*/
                            ("ObjectType~2".id IS NOT DISTINCT FROM "v~1"."v~2") AS "expr~2_value~1"
                    ) AS "expr-2~2"
                    SELECT/*<pg.SelectStmt at 0x7fccb29d5610>*/
                        ("expr-1~2"."expr~1_value~1" OR 
                         "expr-2~2"."expr~2_value~1") AS "expr~3_value~1"
                ) AS "expr-3~2"
                SELECT/*<pg.SelectStmt at 0x7fccb29d53a0>*/
                    "ObjectType~2".id AS "ObjectType_value~1"
                WHERE
                    "expr-3~2"."expr~3_value~1"
            ) AS "expr-4~2"
            SELECT/*<pg.SelectStmt at 0x7fccb29d6540>*/
                "expr-4~2"."ObjectType_value~1" AS "expr~4_identity~1"
        ))
    FROM edgedbpub."default::Ba_t" AS "Ba~2"
    CROSS JOIN LATERAL (
        SELECT/*<pg.SelectStmt at 0x7fccb29d5cd0>*/
            "Ba~2".__type__ AS "__type___identity~1"
    ) AS "q~1"
    JOIN LATERAL (
        FROM "t_schema::ObjectType~1" AS "t~1"
        SELECT/*<pg.SelectStmt at 0x7fccb29d5a00>*/
            "t~1"."expr~4_identity~1" AS "expr~4_identity~2"
    ) AS "q~2"
        ON ("q~1"."__type___identity~1" = "q~2"."expr~4_identity~2")
    CROSS JOIN LATERAL (
        SELECT/*<pg.SelectStmt at 0x7fccb2b4c4d0>*/
            edgedb.issubclass("q~2"."expr~4_identity~2", ARRAY[('default::CBa')::uuid, ('default::Bb')::uuid]) AS "expr~5_serialized~1"
    ) AS "expr-5~2"
    SELECT/*<pg.SelectStmt at 0x7fccb1e7eb40>*/
        "expr-5~2"."expr~5_serialized~1" AS "expr~5_serialized~2"
    WHERE
        ("expr-5~2"."expr~5_serialized~1" IS NOT NULL)

Overlapping union select Ba is Ba | Bb

No changes:

WITH 
"t_schema::ObjectType~1" AS NOT MATERIALIZED 
    ((
        FROM LATERAL (
            FROM edgedbstd."schema::ObjectType_t" AS "ObjectType~2"
            CROSS JOIN LATERAL (
                FROM LATERAL (
                    SELECT/*<pg.SelectStmt at 0x7fccb2c28770>*/
                        (NOT "ObjectType~2"."internal") AS "expr~1_value~1"
                ) AS "expr-1~2"
                CROSS JOIN (SELECT (NULL)::uuid AS "v~2") AS "v~1"
                CROSS JOIN LATERAL (
                    SELECT/*<pg.SelectStmt at 0x7fccb2bd8b00>*/
                        ("ObjectType~2".id IS NOT DISTINCT FROM "v~1"."v~2") AS "expr~2_value~1"
                ) AS "expr-2~2"
                SELECT/*<pg.SelectStmt at 0x7fccb2c2b0e0>*/
                    ("expr-1~2"."expr~1_value~1" OR 
                     "expr-2~2"."expr~2_value~1") AS "expr~3_value~1"
            ) AS "expr-3~2"
            SELECT/*<pg.SelectStmt at 0x7fccc1c12db0>*/
                "ObjectType~2".id AS "ObjectType_value~1"
            WHERE
                "expr-3~2"."expr~3_value~1"
        ) AS "expr-4~2"
        SELECT/*<pg.SelectStmt at 0x7fccc1c12ae0>*/
            "expr-4~2"."ObjectType_value~1" AS "expr~4_identity~1"
    ))
FROM edgedbpub."default::Ba_t" AS "Ba~2"
CROSS JOIN LATERAL (
    SELECT/*<pg.SelectStmt at 0x7fccc1dbbd70>*/
        "Ba~2".__type__ AS "__type___identity~1"
) AS "q~1"
JOIN LATERAL (
    FROM "t_schema::ObjectType~1" AS "t~1"
    SELECT/*<pg.SelectStmt at 0x7fccc1c127e0>*/
        "t~1"."expr~4_identity~1" AS "expr~4_identity~2"
) AS "q~2"
    ON ("q~1"."__type___identity~1" = "q~2"."expr~4_identity~2")
CROSS JOIN LATERAL (
    SELECT/*<pg.SelectStmt at 0x7fccb2df12e0>*/
        edgedb.issubclass("q~2"."expr~4_identity~2", ARRAY[('default::CBb')::uuid, ('default::Ba')::uuid, ('default::CBaBc')::uuid, ('default::CBbBc')::uuid, ('default::CBaBb')::uuid, ('default::Bb')::uuid, ('default::CBa')::uuid, ('default::CBaBbBc')::uuid]) AS "expr~5_serialized~1"
) AS "expr-5~2"
SELECT/*<pg.SelectStmt at 0x7fccb2cf2210>*/
    "expr-5~2"."expr~5_serialized~1" AS "expr~5_serialized~2"
WHERE
    ("expr-5~2"."expr~5_serialized~1" IS NOT NULL)

Simple intersection select Ba [is Bb]

Intersection changed to non-overlapping union.

Onmaster:

FROM LATERAL (
    FROM edgedbpub."default::Ba_t" AS "Ba~2"
    JOIN edgedbpub."default::Bb_t" AS "Bb~1"
        ON ("Ba~2".id = "Bb~1".id)
    SELECT/*<pg.SelectStmt at 0x7fccb2197650>*/
        "Bb~1".id AS "indirection_value~1"
) AS "(default:Ba & default:Bb)_indirection~2"
SELECT/*<pg.SelectStmt at 0x7fccb2194860>*/
    ROW("(default:Ba & default:Bb)_indirection~2"."indirection_value~1") AS "indirection_serialized~1"

On branch:

FROM LATERAL (
    FROM edgedbpub."default::Ba_t" AS "Ba~2"
    JOIN LATERAL (
    (
        FROM edgedbpub."default::CBaBbBc_t" AS "CBaBbBc~1"
        SELECT/*<pg.SelectStmt at 0x7efc98d2f3b0>*/
            "CBaBbBc~1".id AS "Ba_identity~1",
            "CBaBbBc~1".id AS "indirection_value~1"
    ) union ALL (
        FROM edgedbpub."default::CBaBb_t" AS "CBaBb~1"
        SELECT/*<pg.SelectStmt at 0x7efc98d2f2f0>*/
            "CBaBb~1".id AS "Ba_identity~2",
            "CBaBb~1".id AS "indirection_value~2"
    )
    ) AS "(default:Ba & default:Bb)~1"
        ON ("Ba~2".id = "(default:Ba & default:Bb)~1"."Ba_identity~1")
    SELECT/*<pg.SelectStmt at 0x7efc98d2f6e0>*/
        "(default:Ba & default:Bb)~1"."indirection_value~1" AS "indirection_value~3"
) AS "(default:Ba & default:Bb)_indirection~2"
SELECT/*<pg.SelectStmt at 0x7efc98605d30>*/
    ROW("(default:Ba & default:Bb)_indirection~2"."indirection_value~3") AS "indirection_serialized~1"

Complex intersection select {Ba, Bb} [is Bc]

Join target becomes a non-overlapping union. There is probably room for further optimization in process_set_as_path_type_intersection.

Onmaster:

FROM LATERAL (
    FROM LATERAL (
        FROM LATERAL (
            FROM LATERAL (
            (
                FROM LATERAL (
                    FROM edgedbpub."default::Ba_t" AS "Ba~2"
                    SELECT/*<pg.SelectStmt at 0x7fccb231b6b0>*/
                        "Ba~2".id AS "Ba_value~1"
                ) AS "expr-1~2"
                SELECT/*<pg.SelectStmt at 0x7fccb2360320>*/
                    "expr-1~2"."Ba_value~1" AS "expr~1_value~1"
            ) UNION ALL (
                FROM LATERAL (
                    FROM edgedbpub."default::Bb_t" AS "Bb~2"
                    SELECT/*<pg.SelectStmt at 0x7fccb231af90>*/
                        "Bb~2".id AS "Bb_value~1"
                ) AS "expr-2~2"
                SELECT/*<pg.SelectStmt at 0x7fccb2360350>*/
                    "expr-2~2"."Bb_value~1" AS "expr~2_value~1"
            )
            ) AS "q~1"
            SELECT/*<pg.SelectStmt at 0x7fccb23603e0>*/
                "q~1"."expr~1_value~1" AS "expr~3_value~1"
        ) AS "expr-3~2"
        SELECT/*<pg.SelectStmt at 0x7fccb23615b0>*/
            "expr-3~2"."expr~3_value~1" AS "expr~3_value~2"
    ) AS "expr-4~2"
    JOIN edgedbpub."default::Bc_t" AS "Bc~1"  # <---------------- HERE
        ON ("expr-4~2"."expr~3_value~2" = "Bc~1".id)
    SELECT/*<pg.SelectStmt at 0x7fccb2361880>*/
        "Bc~1".id AS "indirection_value~1"
) AS "/oT+Jy3ZSAmLNhU7W5t0jQ: & default:Bc)_indirection~1"
SELECT/*<pg.SelectStmt at 0x7fccb2371790>*/
    ROW("/oT+Jy3ZSAmLNhU7W5t0jQ: & default:Bc)_indirection~1"."indirection_value~1") AS "indirection_serialized~1"

On branch:

FROM LATERAL (
    FROM LATERAL (
        FROM LATERAL (
            FROM LATERAL (
            (
                FROM LATERAL (
                    FROM edgedbpub."default::Ba_t" AS "Ba~2"
                    SELECT/*<pg.SelectStmt at 0x7efc98f95550>*/
                        "Ba~2".id AS "Ba_value~1"
                ) AS "expr-1~2"
                SELECT/*<pg.SelectStmt at 0x7efc991f03b0>*/
                    "expr-1~2"."Ba_value~1" AS "expr~1_value~1"
            ) UNION ALL (
                FROM LATERAL (
                    FROM edgedbpub."default::Bb_t" AS "Bb~2"
                    SELECT/*<pg.SelectStmt at 0x7efc98f958e0>*/
                        "Bb~2".id AS "Bb_value~1"
                ) AS "expr-2~2"
                SELECT/*<pg.SelectStmt at 0x7efc991f28a0>*/
                    "expr-2~2"."Bb_value~1" AS "expr~2_value~1"
            )
            ) AS "q~1"
            SELECT/*<pg.SelectStmt at 0x7efc991f03e0>*/
                "q~1"."expr~1_value~1" AS "expr~3_value~1"
        ) AS "expr-3~2"
        SELECT/*<pg.SelectStmt at 0x7efc9c90ba70>*/
            "expr-3~2"."expr~3_value~1" AS "expr~3_value~2"
    ) AS "expr-4~2"
    JOIN LATERAL (  # <---------------- HERE
    (
    (
        FROM edgedbpub."default::CBaBbBc_t" AS "CBaBbBc~1"
        SELECT/*<pg.SelectStmt at 0x7efc9be66bd0>*/
            "CBaBbBc~1".id AS "expr~4_identity~1",
            "CBaBbBc~1".id AS "indirection_value~1"
    ) union ALL (
        FROM edgedbpub."default::CBaBc_t" AS "CBaBc~1"
        SELECT/*<pg.SelectStmt at 0x7efc991f1df0>*/
            "CBaBc~1".id AS "expr~4_identity~2",
            "CBaBc~1".id AS "indirection_value~2"
    )
    ) union ALL (
        FROM edgedbpub."default::CBbBc_t" AS "CBbBc~1"
        SELECT/*<pg.SelectStmt at 0x7efc991f2270>*/
            "CBbBc~1".id AS "expr~4_identity~3",
            "CBbBc~1".id AS "indirection_value~3"
    )
    ) AS "Y24MZ7b6HtjJEPay5UJz3Q: default:Bb) & default:Bc)~1"
        ON ("expr-4~2"."expr~3_value~2" = "Y24MZ7b6HtjJEPay5UJz3Q: default:Bb) & default:Bc)~1"."expr~4_identity~1")
    SELECT/*<pg.SelectStmt at 0x7efc9be66150>*/
        "Y24MZ7b6HtjJEPay5UJz3Q: default:Bb) & default:Bc)~1"."indirection_value~1" AS "indirection_value~4"
) AS "/oT+Jy3ZSAmLNhU7W5t0jQ: & default:Bc)_indirection~1"
SELECT/*<pg.SelectStmt at 0x7efc990b07d0>*/
    ROW("/oT+Jy3ZSAmLNhU7W5t0jQ: & default:Bc)_indirection~1"."indirection_value~4") AS "indirection_serialized~1"

@dnwpark dnwpark force-pushed the typeref-remove-intersection branch 3 times, most recently from 0b512d0 to 2572338 Compare April 29, 2024 17:12
@dnwpark dnwpark requested a review from msullivan April 29, 2024 17:56
Comment on lines -160 to -162
# If this is an intersection type, this would be a set of
# intersection elements.
intersection: typing.Optional[typing.FrozenSet[TypeRef]] = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It weirds me out a little if we aren't going to track that a type is an intersection in the TypeRef, even when it is one in the schema.

This might not actually matter though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit odd, but the typeref doesn't exactly map onto types anyways

if t.is_union_type(schema) or t.is_intersection_type(schema):
normalized: s_utils.NormalizedTypeExpr = (
s_utils.NormalizedTypeExpr.create(t, schema=schema)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does normalizing help here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All normalization stuff removed.

Comment on lines 332 to 334
# Always include children and ancestors for complex types
ancestor_types = normalized.get_ancestors(schema=schema)
child_types = normalized.get_descendants(schema=schema)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this new behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The issue is in _find_rvar_in_intersection_by_typeref and type_contains. When compiling the intersection, the typeref is used to find an appropriate rvar using the concept of "containment". With only simple unions or intersections this is straightforward enough.

When dealing with an intersection with potentially type expressions on either side, figuring out whether a typeref "contains" another becomes harder since the schema is not available.

For example, with R = (A | B) & C has the ancestors C and Obj. However, L = A | B should count as "containing" R despite not being a direct ancestor. The approach taken here to solve this is to check that L.children is a superset of R.children.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More cases detailed in type_contains

@dnwpark dnwpark force-pushed the typeref-remove-intersection branch 4 times, most recently from 16930f4 to 5e114aa Compare April 30, 2024 22:39
@dnwpark dnwpark marked this pull request as ready for review April 30, 2024 22:56
Comment on lines +1180 to +1192
# filter out subclasses
expanded_types = {
type
for type in expanded_types
if not any(
type is not other and type.issubclass(schema, other)
for other in expanded_types
)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed? I guess it helps us not always require union_is_exhaustive?

Copy link
Contributor Author

@dnwpark dnwpark May 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. It also makes reading the sql easier too :)

Comment on lines 1248 to 1255
def expand_type_expr_ancestors(
type: s_types.Type,
schema: s_schema.Schema,
) -> set[s_types.Type]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This weirds me out some

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... What do you mean? The implementation of the function or rather its existence?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existence, I think

@dnwpark dnwpark force-pushed the typeref-remove-intersection branch 6 times, most recently from 3be2fea to a8de853 Compare May 12, 2024 00:20
@dnwpark dnwpark force-pushed the typeref-remove-intersection branch from a8de853 to 759123d Compare May 13, 2024 14:21
@dnwpark dnwpark merged commit c2c6d60 into master May 14, 2024
23 checks passed
@dnwpark dnwpark deleted the typeref-remove-intersection branch May 14, 2024 18:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants