3 weeks ago
Hi,
I'm building gold layer and silver layers.
in bronze I ingest using auto loader.
data is getting updated once a month.
should I save the fd in silver notebooks using delta live table or delta table?
in the past I used simple:
df.write.save("s3.." )
then created a table in uc pointing on path in s3.
what is the best approach? when do I use dlt vs simple delta?
abdominal for my use case what is the best practice?
#dlt
3 weeks ago
I would say if the data is not complex and you are not handling any DQ checks in the pipeline the go for a regular databricks workflow and save it as delta table since you are refreshing the data every 1 month and it is not streaming workload.
3 weeks ago
thanks for the replay.
I also plan to orchestrated it all from airflow and I prefer not to use databricks workflow. I know dlt work on dlt pipeline I am not sure if and how it should be done.
so you suggest saving in s3 and then create table in uc pointing on s3?
thanks
3 weeks ago
I think if you are registering the table in UC and using delta live table you cannot create an external table in s3. It should be a managed table. DLT, by design, works with managed tables in Unity Catalog.
3 weeks ago
I meant for my use case maybe it's better to simply save in uc instead of using dlt .what do you think?
3 weeks ago
Yes if your other tables are external. Then no point going for DLT and managed table. You can just write to s3 and register the table in UC.
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now