12-17-2024 07:12 AM
Hello,
In our testing environment we would like to be able to only update the DLT tables we are testing for our pipeline. This would help speed up the testing. We currently have the pipeline code being generated dynamically based on how many tables there are to be processed.
What I have discovered though is when I run the pipeline without referencing the table, the DLT table gets removed/deleted. Is there any way to declare a DLT table so that it cannot be removed?
Just so im clear here is a example:
Pretend there at 10 DLT tables in total for our pipeline and it takes 10 minutes to run.
If I only want to update 2 tables in testing I would like to run the pipeline for those 2 tables only and just ignore the others since they are not being updated. This would then take about 2 minutes.
I know in the pipeline GUI interface there is a button called "Select tables for refresh" and this does exactly what I want, the only difference is I want to do this functionality in the Python code instead since that is where I dynamically declare the DLT tables.
12-17-2024 07:40 AM
Hi @eballinger.
To address your requirement of updating only specific Delta Live Tables (DLT) in your testing environment without removing the others, you can leverage the @dlt.table decorator and the temporary parameter in your Python code. This approach allows you to create temporary tables that persist only for the lifetime of the pipeline run, thus preventing their removal when not referenced in subsequent runs.
Here’s how you can modify your pipeline to achieve this:
Here’s an example of how you can define a temporary table
import dlt
@dlt.table(temporary=True)
def my_temp_table():
return spark.read.table("source_table")
In your dynamic pipeline generation logic, you can conditionally include or exclude tables based on your testing requirements. This way, you can run the pipeline for only the tables you need to update, and the temporary tables will not be removed if they are not included in the run.
Additionally, you can use the spark.read.table("LIVE.table_name") function to reference tables within the same pipeline, ensuring that the tables are correctly referenced during the pipeline execution
12-17-2024 07:40 AM
Hi @eballinger.
To address your requirement of updating only specific Delta Live Tables (DLT) in your testing environment without removing the others, you can leverage the @dlt.table decorator and the temporary parameter in your Python code. This approach allows you to create temporary tables that persist only for the lifetime of the pipeline run, thus preventing their removal when not referenced in subsequent runs.
Here’s how you can modify your pipeline to achieve this:
Here’s an example of how you can define a temporary table
import dlt
@dlt.table(temporary=True)
def my_temp_table():
return spark.read.table("source_table")
In your dynamic pipeline generation logic, you can conditionally include or exclude tables based on your testing requirements. This way, you can run the pipeline for only the tables you need to update, and the temporary tables will not be removed if they are not included in the run.
Additionally, you can use the spark.read.table("LIVE.table_name") function to reference tables within the same pipeline, ensuring that the tables are correctly referenced during the pipeline execution
12-17-2024 12:35 PM
Hi again Alberto,
I just tested your solution and I think I missed what you were saying about it only persists while the pipeline is being run. That might work for some other scenarios, but in my example case above I want all of my 10 DLT tables to exists after the pipeline is ran. So how can I update just 2 tables and ignore the other 8? Since there is a way to accomplish this with the GUI interface, there should also be some way to accomplish this programmatically?
Thanks again for your help.
Eddie
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group