Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Power BI Provider #39243

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

ambika-garg
Copy link
Contributor

@ambika-garg ambika-garg commented Apr 24, 2024

Apache Airflow Provider for Power BI.

Operators

PowerBIDatasetRefreshOperator

The operator triggers the Power BI dataset refresh and pushes the details of refresh in Xcom. It can accept the following parameters:

  • dataset_id: The dataset Id.
  • group_id: The workspace Id.
  • wait_for_termination: (Default value: True) Wait until the pre-existing or current triggered refresh completes before exiting.
  • force_refresh: When enabled, it will force refresh the dataset again, after pre-existing ongoing refresh request is terminated.
  • timeout: Time in seconds to wait for a dataset to reach a terminal status for non-asynchronous waits. Used only if wait_for_termination is True.
  • check_interval: Number of seconds to wait before rechecking the refresh status.

Hooks

PowerBI Hook

A hook to interact with Power BI.

  • powerbi_conn_id: Airflow Connection ID that contains the connection information for the Power BI account used for authentication.

Features

  • Xcom Integration: The Power BI Dataset refresh operator enriches the Xcom with essential fields for downstream tasks:

  1. powerbi_dataset_refresh_id: Request Id of the Dataset Refresh.
  2. powerbi_dataset_refresh_status: Refresh Status.
    • Unknown: Refresh state is unknown or a refresh is in progress.
    • Completed: Refresh successfully completed.
    • Failed: Refresh failed (details in powerbi_dataset_refresh_error).
    • Disabled: Refresh is disabled by a selective refresh.
  3. powerbi_dataset_refresh_end_time: The end date and time of the refresh (may be None if a refresh is in progress)
  4. powerbi_dataset_refresh_error: Failure error code in JSON format (None if no error)
  • External Monitoring link: The operator conveniently provides a redirect link to the Power BI UI for monitoring refreshes.

Sample DAG to use the plugin.

Check out the sample DAG code below:

from datetime import datetime

from airflow import DAG
from airflow.operators.bash import BashOperator
from operators.powerbi_refresh_dataset_operator import PowerBIDatasetRefreshOperator


with DAG(
        dag_id='refresh_dataset_powerbi',
        schedule_interval=None,
        start_date=datetime(2023, 8, 7),
        catchup=False,
        concurrency=20,
        tags=['powerbi', 'dataset', 'refresh']
) as dag:

    refresh_in_given_workspace = PowerBIDatasetRefreshOperator(
        task_id="refresh_in_given_workspace",
        dataset_id="<dataset_id",
        group_id="workspace_id",
        force_refresh = False,
        wait_for_termination = False
    )

    refresh_in_given_workspace

* PowerBIDatasetRefreshOperator: Refreshes the Dataset
* PowerBI Hook: A class to interact with Power BI
* Unit tests
@potiuk
Copy link
Member

potiuk commented Apr 25, 2024

Just a kind reminder that proposal to add a new provider should be announced with justification - why you think the provider cannot be released and maintained outside of the community set of providers. Should be a thread at devlist and consensus reached by the community that we want it.

See the https://github.com/apache/airflow/blob/main/PROVIDERS.rst#accepting-new-community-providers for details

Example where you could see discussion and proposal about new providers (but you can search for others):

@Joffreybvn
Copy link
Contributor

A PowerBi / Microsoft Fabric provider would be really nice ! We (Infrabel) started to work on that, via @dabla's MsGraph Operators.

@ambika-garg I contacted you on Airflow's Slack. I'd like to discuss the further plans for this provider, and eventually how we can collaborate ?

@dabla
Copy link
Contributor

dabla commented Apr 29, 2024

A PowerBi / Microsoft Fabric provider would be really nice ! We (Infrabel) started to work on that, via @dabla's MsGraph Operators.

@ambika-garg I contacted you on Airflow's Slack. I'd like to discuss the further plans for this provider, and eventually how we can collaborate ?

This provider is a specialized operator for refreshing PowerBI datasets, but the MSGraphAsyncOperator (with the Trigger and Sensor) also allows you to achieve the same without a dedicated operator, but then you'll need to combine multiple ones. Nonetheless this could be a handy operator and nice addition as it combines the triggering and polling of the status of the dataset refresh in one handy operator. I agree with @Joffreybvn that this would be a nice opportunity to collaborate on this one and make sure this operator re-uses as much common code (for example the KiotaRequestAdapterHook could be shared in this case) as possible with the MSGraphAsyncOperator. The polling for example could then be done in an Aync way so that we don't block the Airflow workers until we get back a response from the PowerBI REST API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants