โ02-16-2023 01:54 AM
Hello Community,
I am currently working on populating gold layer tables. Source for these gold layer tables are silver layer tables. A query is going to run on silver layer tables, spark sql query contains joins between multiple tables.
ex:
select columns
from table1
inner join table2
on join_condition
inner join table3 on join_condition
where clause.
Now my question is how can i load the data incrementally from the query?. i should be able to schedule the pipeline for every 30 mins.
Thanks for the help.
Thanks
Venkat
โ02-16-2023 02:46 AM
Hi @venkat,
You can use merge or upsert operation in databricks for the incremental load.
Yes you can schedule the job to run every 30 min by using databricks workflow.
โ02-16-2023 02:46 AM
Hi @venkat,
You can use merge or upsert operation in databricks for the incremental load.
Yes you can schedule the job to run every 30 min by using databricks workflow.
โ02-16-2023 03:23 AM
Hi @Ajay Pandeyโ ,
Thanks for your reply,
I will try and let you know.
Thanks
Venkat
โ02-16-2023 03:41 AM
Sure
โ02-16-2023 09:03 PM
Hi @bodempudi venkatโ
Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help.
We'd love to hear from you.
Thanks!
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now