โ11-05-2024 06:33 AM
Is it possible to obtain somehow if a DLT pipeline run is running in Full Refresh or incremental mode from within a notebook running in the pipeline?I looked into the pipeline configuration variables but could not find anything.
It would be benefitial to have this information from within the code and do something different in case of a full refresh.
My workaround is so far to have two pipeline jobs and set a config variable if it is running in full refresh, but when executing the pipeline manually this gets dangerous since I have to remind myself to the the value to the correct refresh type.
โ11-05-2024 08:10 AM
You can use the following code:
pipeline_run_config = get_current_pipeline_run_config()
# Create a StartUpdate object from the pipeline run configuration
start_update = StartUpdate.from_dict(pipeline_run_config)
# Check if the pipeline run is a full refresh
if start_update.full_refresh:
print("The pipeline is running in Full Refresh mode.")
else:
print("The pipeline is running in Incremental mode.")
โ11-05-2024 09:05 AM
Thanks for the quick answer. Where did you get the
get_current_pipeline_run_config()
from? I used spark.conf.getAll which apperently does not have the refresh mode info.
StartUpdate
comes from the databricks.sdk.service.pipelines?
โ11-05-2024 11:32 AM
I am looking further on this with our teams, can you please provide us with more context on your usecase for this information?
โ11-13-2024 12:02 AM
We found a solution where we do not need to determine the refresh mode anymore. But I still do not know how to get the current refresh mode of the current pipeline run from within a notebook that is running in the pipeline. This may would still be beneficial to others.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group