Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] include_source_columns=false not working #181

Open
klaus1978 opened this issue Jan 13, 2023 · 6 comments
Open

[BUG] include_source_columns=false not working #181

klaus1978 opened this issue Jan 13, 2023 · 6 comments
Assignees
Labels
ado bug Something isn't working

Comments

@klaus1978
Copy link

klaus1978 commented Jan 13, 2023

Describe the bug
After upgrading to 0.9.2 (from 0.8.3) I realized problems in the Stage macro when there are weird field names (eg with special characters or blanks) as the escaping is not done automatically anymore.

Environment

dbt version: 1.3.2
dbtvault version: 0.9.2
Database/Platform: Snowflake

To Reproduce
if it is not me, but a general issue:

  1. Select a source table with unusual field names ( e.g. "/MY/COLUMN" )
  2. Create the stage model for the source table
  3. Compiling should already fail, even if you don 't use any of the "unusual" fields and disable include_source_columns:
    {{ dbtvault.stage(include_source_columns=false,

If you check the compiled file, you will see all fields from source in the CTE "source_data" are included ( I would have thought that would be prevented by include_source_columns=false)
and not all of them are escaped by '" "' which is causing the problem with unusual field names...

Expected behavior
Best would be the same behavior as in 0.8.3:
all fields in CTE source_data were escaped...

see also https://dbtvault.slack.com/archives/CRLDUSDEF/p1673540131308529

Many thanks in advance
Klaus

AB#5348

@klaus1978 klaus1978 added the bug Something isn't working label Jan 13, 2023
@DVAlexHiggs DVAlexHiggs changed the title [BUG] <my bug title> [BUG] include_source_columns=false not working Jan 14, 2023
@klaus1978
Copy link
Author

Hi,
just for clarification, it is not just the
include_source_columns=false which is not working.

Also when I use the include_source_columns=true it creates a wrong CTE as the columns are not escaped anymore. Or, some are escaped and other are not...

Output would be eg:

-- Generated by dbtvault.

WITH source_data AS (

    SELECT

    MANDT,
    VBELN,
    POSNR,
    "MATNR",
    "MATWA",
    "PMATN",
    ...
    /BEV1/SRFUND,
    AUFPL_OLC,
    APLZL_OLC,
    /SIE/MED_QMNUM_S,
    /SIE/MED_CSENO_S,


    FROM ANALYTICSLAYER.AN_DVSCM_STAGE_D.RSTAGE_SV_P41_P_VBAP
),...

@DVAlexHiggs
Copy link
Member

DVAlexHiggs commented Jan 16, 2023

Yes understood. I was just making sure that the placeholder was removed from the title. Feel free to modify!

@DVAlexHiggs
Copy link
Member

DVAlexHiggs commented Feb 9, 2023

Hello, after some investigation this is an issue for a number of reasons.

In terms of a solution, the proposed solution of making it like 0.8.3 where all source columns are escaped won't work for the same reasons we removed it: Escaping by default causes problems in different platforms, and creates unintended errors (especially around casing in postgres) which are frustrating for the user at best and breaking at worst.

A potential way to fix this would be to have a new "escaped_column" config in the stage macro, which would then be used to escape columns in the source column selection, thus giving the user the control they need without causing problems for users who do not need to escape columns.

It would work as below:

{%- set yaml_metadata -%}
source_model:
    raw_source_name: source_table_name
escaped_columns:
   - "BOOKING DATE"
{%- endset -%}

{% set metadata_dict = fromyaml(yaml_metadata) %}

{% set source_model = metadata_dict["source_model"] %}

{{ dbtvault.stage(include_source_columns=true,
                  source_model=source_model,
                  escaped_columns=escaped_columns) }}

Which would cause "BOOKING DATE" to be escaped in the source columns.

We're currently working out how this would interact with the existing escaping solution and will keep this issue updated.

@klaus1978 Please let us know what you think 😄

@klaus1978
Copy link
Author

Hi,
hm this might work, as long as I can use the original field names afterwards? Or would I have to rename "BOOKING DATE" to eg. BOOKING_DATE?

KR Klaus

@DVAlexHiggs
Copy link
Member

DVAlexHiggs commented Feb 13, 2023

Hi, thanks for your response!

The idea here is that dbtvault would automatically ensure that the columns in escaped_columns would be referenced in their escaped form throughout the stage model to avoid syntax issues in that specific model, however this escaping would not be retained after the stage; i.e. in your hub you would need to refer to it as "BOOKING DATE" still, with your own escaping such as '"BOOKING DATE"'.

The best practise here would then be to create derived columns which 'fix' the problem columns (i.e. aliases "BOOKING DATE" as "BOOKING_DATE") and would mean you don't have to worry about this downstream. Saying this, this approach would still give users the option to either alias them or leave them as-is, depending on preference.

Does that make sense?

@klaus1978
Copy link
Author

Well,
we have quite a lot of fields I'd have to rename. Not sure if this is a special case with our SAP environment.
Generally, I would rather keep the original field names in the Raw Vault.

If you have to rename the fields anyway, then I guess there is another possibility where you (the dbtvault team) would not have to do anything: just rename it in a dbt model before you pass it to the stage macros?

KR Klaus

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ado bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants