I have a Databricks pipeline that pulls data from AWS, which takes ~90 minutes. After this, I need to refresh a series of Power BI dataflows (~45 mins) and then datasets (~45 mins).
I want to trigger the Power BI refresh automatically from Databricks once the pipeline finishes. However, if I run this as a final task in the Databricks job, the cluster has to stay alive for up to 90 extra minutes while Power BI refreshes — which wastes compute and costs.
I want to avoid scheduled refreshes in Power BI since they don’t align well with my pipeline timing.
Is there a recommended architecture to:
Trigger Power BI refreshes programmatically from Databricks
Let the refresh continue independently after triggering
Avoid cluster runtime during Power BI refresh wait time?