Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing to apply where condition when joining aws_ecr_image #435

Open
gabe-gfm opened this issue Mar 13, 2024 · 6 comments
Open

Failing to apply where condition when joining aws_ecr_image #435

gabe-gfm opened this issue Mar 13, 2024 · 6 comments
Assignees
Labels
bug Something isn't working

Comments

@gabe-gfm
Copy link

gabe-gfm commented Mar 13, 2024

Describe the bug
I'm seeing unexpected results using a WHERE clause on the aws_ecr_image table. All rows from a JOIN condition are being returned even though the query specifies most rows should be excluded by a WHERE condition.

This reproduction case involves joining a second table to provide a target label string for evaluating whether the target is in the JSONB list of aws_ecr_image.image_tags.

Insightfully, I found the problem does not occur qhen querying local-table-copy of the aws_ecr_image table. This leads me to suspect it's a foreign-data-wrapper bug. But I'm not confident in which project the bug exists ( steampipe-postgres-fdw, steampipe-plugin-aws, or steampipe?).

Steampipe version
v0.21.4

Plugin version (steampipe plugin list)
turbot/aws 0.132.0

To reproduce

  1. Have AWS ECR image data in aws_ecr_image
  2. Join a second table on ecr.repository_name` to provide data for specific repository and image_tag sets.
  3. Test for presence of specific image_tag in image_tags using the >@ JSONB inclusion test.

@>: Checks if a JSON document contains another JSON document.

Example query

with id_tag_repo as -- DEBUG: this is a stand-in for AWS container task definitions
(
  select 1 as id, to_jsonb('ZZZ'::text) as image_tag, 'my-repo-A' as image_repository
  union
  select 2 as id, '"latest"'::jsonb as image_tag, 'my-repo-B' as image_repository
  union
  select 3 as id, '"latest"'::jsonb as image_tag, 'my-repo-A' as image_repository
)
select 
  id_tag_repo.image_tag as target_tag
  , id
  , ecr.image_tags @> image_tag as is_tag_in_tags -- DEBUG: Matches where condition. Expect to see only true rows
  , repository_name
  , image_tags
from aws_ecr_image ecr
-- from ecr_copy ecr -- DEBUG
join id_tag_repo
  on ecr.repository_name = id_tag_repo.image_repository
where
  ecr.image_tags @> image_tag
  1. Note the results of is_tag_in_tags column may show true, false, null, but the where condition should limit it to only true.

Buggy Results:
image

Expected behavior
The expected behavior can be easily demonstrated by running the same query on a local copy of the a foreign-data-wrapper table.

  1. Copy to a local table
--drop table ecr_copy
create table ecr_copy as select * from aws_ecr_image
  1. Re-running the above query, but targeting from ecr_copy rather than from aws_ecr_image
  2. Notice that only results where is_tag_in_tags column shows true.

Expected results:
image

Additional context
I found it difficult to discover how I might locally rebuild steampipe-ftw or the project's postgres container to investigate if this could be a known postgres bug.

@gabe-gfm gabe-gfm added the bug Something isn't working label Mar 13, 2024
@misraved misraved assigned bigdatasourav and unassigned ParthaI Apr 1, 2024
@bigdatasourav bigdatasourav transferred this issue from turbot/steampipe-plugin-aws Apr 1, 2024
@bigdatasourav
Copy link

@kaidaguerre, Could you please look into this issue?

@kaidaguerre
Copy link
Contributor

Sorry for the delay @gabe-gfm , sure we'll look into this 👍

@gabe-gfm
Copy link
Author

Perhaps this is related?
#440

@gabe-gfm
Copy link
Author

@jreyesr @kaidaguerre

Unfortunately, this was not resolved using the most aws plugin + postgres foreign data wrapper (us-docker.pkg.dev/steampipe/steampipe/fdw:1.11.0-rc.6). It still returns all rows when attempting to filter for a list of values inside the jsonb image_tags array.

I did notice a particular log surface in my SQL client (not steampipe query):

2024-04-29 17:22:28 fdwGetForeignPlan - best_path->fdw_private is NULL. Defaulting to setting canPushdownAllSortFields to true

From

elog(NOTICE, "fdwGetForeignPlan - best_path->fdw_private is NULL. Defaulting to setting canPushdownAllSortFields to true");

@gabe-gfm
Copy link
Author

gabe-gfm commented May 3, 2024

I identified a better workaround for anyone else encountering this.

You may specify MATERIALIZED (docs) in a Postgres query to prevent the demonstrated incorrect query filter results.

MATERIALIZED is an optimization hint resulting in explicit fetching of a common table express (CTE) result without trying to push down potential optimizations into the query plan.

This example query gave me correctly filtered results.

-- id_tag_repo is a mock for identifying interesting AWS container task definitions linked to ECR images
with id_tag_repo as  
(
  select 1 as id, to_jsonb('ZZZ'::text) as image_tag, 'my-repo-A' as image_repository
  union
  select 2 as id, '"latest"'::jsonb as image_tag, 'my-repo-B' as image_repository
  union
  select 3 as id, '"latest"'::jsonb as image_tag, 'my-repo-A' as image_repository
), 
ecr_info as MATERIALIZED -- MATERIALIZED used to work around steampipe-fdw bug 
(
  select repository_name, image_tags
  from  aws_ecr_image
)
select 
  id_tag_repo.image_tag as target_tag
  , id
  , ecr.image_tags @> image_tag as is_tag_in_tags -- DEBUG: Matches where condition. Expect to see only true rows
  , repository_name
  , image_tags
from ecr_info ecr
join id_tag_repo
  on ecr.repository_name = id_tag_repo.image_repository
where
  ecr.image_tags @> image_tag

@kaidaguerre
Copy link
Contributor

@gabe-gfm good that you've found a workaround. I need to dig into why the JSON join condition is causing this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants