Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect results when joining two tables with multiple join conditions #271

Open
karanpopat opened this issue Jan 12, 2023 · 0 comments
Open
Assignees
Labels
bug Something isn't working

Comments

@karanpopat
Copy link

karanpopat commented Jan 12, 2023

While querying by joining two tables with multiple join conditions(more than 2 in my case), I observed that one of the conditions was being ignored hence giving incorrect results

To Reproduce

  1. Setup an AWS aggregator connection
  2. For example When joining the tables aws_dax_parameter_group and aws_dax_cluster on parameter_group_name, account_id and region. The p.account_id = c.account_id condition is being ignored. Thus returning results from multiple accounts
> select
  p.parameter_group_name as id,
  p.title as title,
  p.account_id as pg_account_id,
  c.account_id as dax_account_id,
  p.region as pg_region,
  c.region as dax_region
from
  aws_dax_parameter_group as p
  join aws_dax_cluster as c
    on c.parameter_group ->> 'ParameterGroupName' = p.parameter_group_name
    and p.account_id = c.account_id
    and p.region = c.region
where
  c.arn = 'arn:aws:dax:us-east-1:111122223333:cache/graph-test-dax'
+----------------+----------------+---------------+----------------+-----------+------------+
| id             | title          | pg_account_id | dax_account_id | pg_region | dax_region |
+----------------+----------------+---------------+----------------+-----------+------------+
| default.dax1.0 | default.dax1.0 | 123456789012  | 111122223333   | us-east-1 | us-east-1  |
| default.dax1.0 | default.dax1.0 | 111122223333  | 111122223333   | us-east-1 | us-east-1  |
+----------------+----------------+---------------+----------------+-----------+------------+

+------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN                                                                                                                               |
+------------------------------------------------------------------------------------------------------------------------------------------+
| Nested Loop  (cost=0.00..40040001000000.00 rows=125000 width=192) (actual time=12.747..12.866 rows=2 loops=1)                            |
|   ->  Foreign Scan on aws_dax_cluster c  (cost=0.00..40000000000000.00 rows=1000000 width=400) (actual time=7.364..7.380 rows=1 loops=1) |
|         Filter: (arn = 'arn:aws:dax:us-east-1:111122223333:cache/graph-test-dax'::text)                                                  |
|   ->  Foreign Scan on aws_dax_parameter_group p  (cost=0.00..40000.00 rows=100 width=400) (actual time=4.604..4.707 rows=2 loops=1)      |
|         Filter: ((region = c.region) AND ((c.parameter_group ->> 'ParameterGroupName'::text) = parameter_group_name))                    |
|         Rows Removed by Filter: 3                                                                                                        |
| Planning Time: 8.682 ms                                                                                                                  |
| Execution Time: 14.922 ms                                                                                                                |
+------------------------------------------------------------------------------------------------------------------------------------------+
  1. The query plan above shows that the filter for p.account_id = c.account_id is ignored.
  2. On removing one of the join conditions from the same query, for eg. p.region = c.region below, the result displayed are correct
> select
  p.parameter_group_name as id,
  p.title as title,
  p.account_id as pg_account_id,
  c.account_id as dax_account_id,
  p.region as pg_region,
  c.region as dax_region
from
  aws_dax_parameter_group as p
  join aws_dax_cluster as c
    on c.parameter_group ->> 'ParameterGroupName' = p.parameter_group_name
    and p.account_id = c.account_id
where
  c.arn = 'arn:aws:dax:us-east-1:111122223333:cache/graph-test-dax'
+----------------+----------------+---------------+----------------+----------------+------------+
| id             | title          | pg_account_id | dax_account_id | pg_region      | dax_region |
+----------------+----------------+---------------+----------------+----------------+------------+
| default.dax1.0 | default.dax1.0 | 111122223333  | 111122223333   | ap-south-1     | us-east-1  |
| default.dax1.0 | default.dax1.0 | 111122223333  | 111122223333   | us-west-2      | us-east-1  |
| default.dax1.0 | default.dax1.0 | 111122223333  | 111122223333   | us-east-1      | us-east-1  |
+----------------+----------------+---------------+----------------+----------------+------------+

+------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN                                                                                                                               |
+------------------------------------------------------------------------------------------------------------------------------------------+
| Nested Loop  (cost=0.00..40040001000000.00 rows=25000000 width=192) (actual time=12.474..12.939 rows=13 loops=1)                         |
|   ->  Foreign Scan on aws_dax_cluster c  (cost=0.00..40000000000000.00 rows=1000000 width=400) (actual time=4.827..4.840 rows=1 loops=1) |
|         Filter: (arn = 'arn:aws:dax:us-east-1:111122223333:cache/graph-test-dax'::text)                                                  |
|   ->  Foreign Scan on aws_dax_parameter_group p  (cost=0.00..40000.00 rows=100 width=400) (actual time=7.004..7.454 rows=13 loops=1)     |
|         Filter: ((account_id = c.account_id) AND ((c.parameter_group ->> 'ParameterGroupName'::text) = parameter_group_name))            |
|         Rows Removed by Filter: 16                                                                                                       |
| Planning Time: 10.968 ms                                                                                                                 |
| Execution Time: 16.336 ms                                                                                                                |
+------------------------------------------------------------------------------------------------------------------------------------------+
  1. So does the plan show that all the join conditions were considered.
  2. Now on removing a different condition from the same query, for eg. c.parameter_group ->> 'ParameterGroupName' = p.parameter_group_name below, the result displayed are correct again
> select
  p.parameter_group_name as id,
  p.title as title,
  p.account_id as pg_account_id,
  c.account_id as dax_account_id,
  p.region as pg_region,
  c.region as dax_region
from
  aws_dax_parameter_group as p
  join aws_dax_cluster as c
    on p.account_id = c.account_id
    and p.region = c.region
where
  c.arn = 'arn:aws:dax:us-east-1:111122223333:cache/graph-test-dax'
+----------------+----------------+---------------+----------------+-----------+------------+
| id             | title          | pg_account_id | dax_account_id | pg_region | dax_region |
+----------------+----------------+---------------+----------------+-----------+------------+
| default.dax1.0 | default.dax1.0 | 111122223333  | 111122223333   | us-east-1 | us-east-1  |
+----------------+----------------+---------------+----------------+-----------+------------+

+---------------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN                                                                                                                                        |
+---------------------------------------------------------------------------------------------------------------------------------------------------+
| Nested Loop  (cost=0.00..70057540002500.00 rows=25000000 width=192) (actual time=17.646..17.691 rows=1 loops=1)                                   |
|   Join Filter: ((p.account_id = c.account_id) AND (p.region = c.region))                                                                          |
|   Rows Removed by Join Filter: 28                                                                                                                 |
|   ->  Foreign Scan on aws_dax_parameter_group p  (cost=0.00..40000000000000.00 rows=1000000 width=400) (actual time=3.382..4.065 rows=29 loops=1) |
|   ->  Materialize  (cost=0.00..30000000045040.00 rows=1000000 width=300) (actual time=0.468..0.469 rows=1 loops=29)                               |
|         ->  Foreign Scan on aws_dax_cluster c  (cost=0.00..30000000000000.00 rows=1000000 width=300) (actual time=13.549..13.587 rows=1 loops=1)  |
|               Filter: (arn = 'arn:aws:dax:us-east-1:111122223333:cache/graph-test-dax'::text)                                                     |
| Planning Time: 11.774 ms                                                                                                                          |
| Execution Time: 21.489 ms                                                                                                                         |
+---------------------------------------------------------------------------------------------------------------------------------------------------+
  1. So does the plan show that all the join conditions were considered.
  2. Same query with cross-join and the join conditions in the where clause, generates a similar plan
> explain analyse select
  p.parameter_group_name as id,
  p.title as title,
  p.account_id as pg_account_id,
  c.account_id as dax_account_id,
  p.region as pg_region,
  c.region as dax_region
from
  aws_dax_parameter_group as p,
  aws_dax_cluster as c
where
  c.parameter_group ->> 'ParameterGroupName' = p.parameter_group_name
  and p.account_id = c.account_id
  and p.region = c.region
  and c.arn = 'arn:aws:dax:us-east-1:111122223333:cache/graph-test-dax'
+------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN                                                                                                                               |
+------------------------------------------------------------------------------------------------------------------------------------------+
| Nested Loop  (cost=0.00..40040001000000.00 rows=125000 width=192) (actual time=11.617..11.679 rows=2 loops=1)                            |
|   ->  Foreign Scan on aws_dax_cluster c  (cost=0.00..40000000000000.00 rows=1000000 width=400) (actual time=6.915..6.929 rows=1 loops=1) |
|         Filter: (arn = 'arn:aws:dax:us-east-1:111122223333:cache/graph-test-dax'::text)                                                  |
|   ->  Foreign Scan on aws_dax_parameter_group p  (cost=0.00..40000.00 rows=100 width=400) (actual time=4.235..4.283 rows=2 loops=1)      |
|         Filter: ((region = c.region) AND ((c.parameter_group ->> 'ParameterGroupName'::text) = parameter_group_name))                    |
|         Rows Removed by Filter: 3                                                                                                        |
| Planning Time: 13.727 ms                                                                                                                 |
| Execution Time: 13.760 ms                                                                                                                |
+------------------------------------------------------------------------------------------------------------------------------------------+
  1. Similar example
> explain analyse select
  b.arn as eventbridge_bus_arn
from
  aws_eventbridge_rule r
  join aws_eventbridge_bus b on r.event_bus_name = b.name
    and r.region = b.region
    and r.account_id = b.account_id
where
  r.arn = 'arn:aws:events:ap-south-1:111122223333:rule/test'
+------------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN                                                                                                                                     |
+------------------------------------------------------------------------------------------------------------------------------------------------+
| Nested Loop  (cost=0.00..40040001000000.00 rows=125000 width=32) (actual time=15.128..25.383 rows=2 loops=1)                                   |
|   ->  Foreign Scan on aws_eventbridge_rule r  (cost=0.00..40000000000000.00 rows=1000000 width=400) (actual time=7.076..17.280 rows=1 loops=1) |
|         Filter: (arn = 'arn:aws:events:ap-south-1:111122223333:rule/test'::text)                                                               |
|         Rows Removed by Filter: 437                                                                                                            |
|   ->  Foreign Scan on aws_eventbridge_bus b  (cost=0.00..40000.00 rows=100 width=400) (actual time=7.371..7.419 rows=2 loops=1)                |
|         Filter: ((r.region = region) AND (r.event_bus_name = name))                                                                            |
| Planning Time: 11.155 ms                                                                                                                       |
| Execution Time: 28.488 ms                                                                                                                      |
+------------------------------------------------------------------------------------------------------------------------------------------------+

Steampipe version
Tested with both 0.18.0 and 0.17.4

@karanpopat karanpopat added the bug Something isn't working label Jan 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants