Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ele 1684 create an unlinked tables macro #523

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

ofek1weiss
Copy link
Contributor

No description provided.

@linear
Copy link

linear bot commented Sep 6, 2023

ELE-1684 create an "unlinked tables" macro

Definition of done:

  • write a macro that finds all the tables in the warehouse that do not exist as models/sources/etc in the dbt project
  • possibly add a flag to auto delete them

@github-actions
Copy link
Contributor

github-actions bot commented Sep 6, 2023

👋 @ofek1weiss
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in the elementary repository.

{% for schema in schemas %}
{% set db_name, schema_name = schema.split('.') %}
{% set schema_relation = api.Relation.create(db_name, schema_name).without_identifier() %}
{% set relations = list_relations_without_caching(schema_relation) %}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefix package

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DONE

@@ -0,0 +1,59 @@
{% macro get_tables_without_models(deprecated_models_path=none) %}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From this macro you could infer that it's somewhat related to sourced, snapshots, seeds, etc.
Maybe call it something in the area of leftover or dangling?
Not super critical I suppose.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like dangling, on it

{% set schema_relation = api.Relation.create(db_name, schema_name).without_identifier() %}
{% set relations = dbt.list_relations_without_caching(schema_relation) %}
{# list_relations_without_caching can return either a list of Relation objects or an agate depending on the adapter #}
{% if relations.append is defined %}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not do relations is list?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is pretty hacky, but there isnt an is list test in jinja, only an is iterable, and an agate is also iterable.
i tried to find a less hacky solution but this is the best i found

{% if relations.append is defined %}
{# relations is a list of Relation objects #}
{% for relation in relations %}
{% do tables.append(schema ~ "." ~ relation.identifier) %}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually prefer using api.Relation.create as it takes care of stringifying the relation in the safest way (quoting, etc.).

{% endmacro %}


{% macro get_db_tables_in_schemas(schemas) %}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about calling it get_tables_in_schemas?
The db confused me a bit 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the thought was to separate model_tables - tables defined by models, and db_tables - tables that exist in the db, but looks like its more confusing that way

{% if not deprecated_models_path %}
{% do relevant_nodes.append(model_node) %}
{% endif %}
{% if deprecated_models_path and not model_node.original_file_path.startswith(deprecated_models_path) %}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{% if deprecated_models_path and not model_node.original_file_path.startswith(deprecated_models_path) %}
{% elif not model_node.original_file_path.startswith(deprecated_models_path) %}

{% for model_node in relevant_nodes %}
{% set model_schema = api.Relation.create(model_node.database, model_node.schema).without_identifier() %}
{% set model_table = api.Relation.create(model_node.database, model_node.schema, model_node.name) %}
{% if model_schema not in model_schemas %}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use a set() if you want but not critical.

{% endfor %}
{% for model_node in relevant_nodes %}
{% set model_schema = api.Relation.create(model_node.database, model_node.schema).without_identifier() %}
{% set model_table = api.Relation.create(model_node.database, model_node.schema, model_node.name) %}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to use alias, not name.

{% for model_node in graph.nodes.values() | selectattr('resource_type', '==', 'model') %}
{% if not deprecated_models_path %}
{% do relevant_nodes.append(model_node) %}
{% elif not model_node.original_file_path.startswith(deprecated_models_path) %}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just add this as an or above?

    {% if not deprecated_models_path or not model_node.original_file_path.startswith(deprecated_models_path) %}

{% endmacro %}


{% macro get_tables_in_schemas(schemas) %}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the nit-picking, but maybe call this get_all_tables_in_schemas in order to clarify that this one includes tables that are not in dbt compared to the macro below?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants