Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Bigquery pseudo-column support #186

Open
adammarples opened this issue Feb 16, 2023 · 3 comments
Open

[FEATURE] Bigquery pseudo-column support #186

adammarples opened this issue Feb 16, 2023 · 3 comments
Assignees
Labels
feature This is is requesting a new feature

Comments

@adammarples
Copy link

adammarples commented Feb 16, 2023

It would be good to be able to use the _FILE_NAME pseudo-column which bigquery uses to hold the filename of the files in an external table.

Describe the solution you'd like
To do this, it would need to be selected in the initial stage select CTE, because it is unavailable after that, ie. when derived columns are added. Presumably the best way to do that is to create a macros/tables/bigquery/stage.sql file with an "external_table_filename" kwarg flag, or if this could be covered with a src_extra_columns variable? There are other pseudo-columns too such as in partitioned tables.

Describe alternatives you've considered

Additional context
Happy to work on this if it seems appropriate

AB#5366

@adammarples adammarples added the feature This is is requesting a new feature label Feb 16, 2023
@DVAlexHiggs
Copy link
Member

DVAlexHiggs commented Feb 17, 2023

Hi! Thanks for this interesting feature request.

I have to admit I don't know much about this bit of BigQuery.

My questions would be: is this something that should be dealt with in dbtvault or is it in fact already built into dbt? From the BigQuery dbt configs documentation it does allude to pseudo-columns.

Otherwise, this should already be possible doing something like the following:

 
WITH stage AS (
   {{ dbtvault.stage() }} 
), 

pseudocolumns AS (
   SELECT *, _FILE_NAME AS FILE_NAME
   FROM stage
)

SELECT * FROM pseudocolumns


However, this is a little clunky and some kind of config variable as suggested would be a lot cleaner.

@adammarples
Copy link
Author

Hi. The problem as I see it there (and I am not an expert on dbt-vault or dbt-bigquery or how they interact) is that _FILE_NAME can only be selected directly from the source table, if it is not included in the initial CTE then it will be unavailable going forward. It will also require an alias ie. 'FILE_NAME'.

dbt-vault provides a file called dbtvault/macros/staging/stage.sql which I believe is what dbt-vault runs when dbtvault.stage() is called, and the _FILE_NAME needs to be inserted right there at the source.

WITH source_data AS (

    SELECT

    {{- "\n\n    " ~ dbtvault.print_list(list_to_print=all_source_columns, columns_to_escape=columns_to_escape) if all_source_columns else " *" }}
  , _FILE_NAME AS FILE_NAME -- <- we need this line optionally
    FROM {{ source_relation }}
    {%- set last_cte = "source_data" %}
)

I can have a go at a PR

@DVAlexHiggs
Copy link
Member

Ok that makes sense, thanks for explaining!

I will add this to our backlog :) Happy for you to do a PR but at this time we cannot accept directly as our test harness is not publicly available and it does need to be run through that before release. We hope to improve contribution guidelines and processes in future.

@DVAlexHiggs DVAlexHiggs added feature This is is requesting a new feature and removed feature This is is requesting a new feature labels May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature This is is requesting a new feature
Projects
None yet
Development

No branches or pull requests

2 participants