05-09-2022 02:14 AM
Suppose I have a Delta Live Tables framework with 2 tables:
In other words, the data flow is json source -> Table 1 -> Table 2.
Now if I find some bugs in the transformation Table 1 -> Table 2, how can I re-run only the transformation Table 1 -> Table 2 and leave Table 1 intact?
If I use Full Refresh, it would refresh Table 1 & rerun the json ingestion as well...
05-11-2022 03:14 PM
@Long Tran , The best way to achieve this would be a work around. If you could sacrifice a row of code in your Table 1 or add a row of Nulls into Table 1 without causing problems for yourself further down your pipeline, I suggest you try this Retain manual deletes or updates - "You can manually delete or update the record from raw_user_table and do a refresh operation to recompute the downstream tables."
However, I want to note that if you are ingesting from a source that has no new data to ingest, the full refresh probably wont re-ingest the same data causing duplication. Try it out on a subsection of your data.
05-11-2022 03:14 PM
@Long Tran , The best way to achieve this would be a work around. If you could sacrifice a row of code in your Table 1 or add a row of Nulls into Table 1 without causing problems for yourself further down your pipeline, I suggest you try this Retain manual deletes or updates - "You can manually delete or update the record from raw_user_table and do a refresh operation to recompute the downstream tables."
However, I want to note that if you are ingesting from a source that has no new data to ingest, the full refresh probably wont re-ingest the same data causing duplication. Try it out on a subsection of your data.
06-20-2022 11:43 AM
Thank you so much!
06-20-2022 11:42 AM
sorry for the delayed response, User16460565755155528764's answer is very helpful.
06-20-2022 10:27 AM
Hey @Long Tran
Does @Sara Dooley response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly? Else please let us know if you need more help.
We'd love to hear from you.
Cheers!
06-20-2022 11:42 AM
Hi, sorry I am new here - how do I mark the answer as resolved? Thanks a lot.
06-21-2022 08:45 AM
Hi @Long Tran
Thank you so much for getting back to us. It's really great of you to mark the answer as best.
We really appreciate your time.
Wish you a great Databricks journey ahead!
06-29-2022 09:48 PM
An update to anyone finding this thread nowadays.
This is possible using the reset.allowed property as documented here: https://docs.databricks.com/data-engineering/delta-live-tables/delta-live-tables-cookbook.html#retai...
02-08-2024 02:14 PM
I want to tag onto this thread because I have the same need to refresh only a single table within a larger DLT pipeline. Unfortunately it seems the links in the accepted answer and in Felipe's follow up no longer contain the correct information. Is there a proper way to do this now, in 2024?
02-09-2024 11:05 AM
Answering my own question: nowadays (February 2024) this can all be done via the UI.
When viewing your DLT pipeline there is a "Select tables for refresh" button in the header. If you click this, you can select individual tables, and then in the bottom right corner there are options to "Full refresh selection" or "Refresh selection." Select "Full" in order to start your table over clean.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group