Databricks Community

venkat-bodempud · ‎02-16-2023

Hello Community,

I am currently working on populating gold layer tables. Source for these gold layer tables are silver layer tables. A query is going to run on silver layer tables, spark sql query contains joins between multiple tables.

ex:

select columns

from table1

inner join table2

on join_condition

inner join table3 on join_condition

where clause.

Now my question is how can i load the data incrementally from the query?. i should be able to schedule the pipeline for every 30 mins.

Thanks for the help.

Thanks

Venkat

Ajay-Pandey · ‎02-16-2023

Hi @venkat,

You can use merge or upsert operation in databricks for the incremental load.

Yes you can schedule the job to run every 30 min by using databricks workflow.

Ajay Kumar Pandey

View solution in original post

Ajay-Pandey · ‎02-16-2023

Hi @venkat,

You can use merge or upsert operation in databricks for the incremental load.

Yes you can schedule the job to run every 30 min by using databricks workflow.

Ajay Kumar Pandey

venkat-bodempud · ‎02-16-2023

Hi @Ajay Pandey ,

Thanks for your reply,

I will try and let you know.

Thanks

Venkat

Ajay-Pandey · ‎02-16-2023

Sure

Ajay Kumar Pandey

Anonymous · ‎02-16-2023

Hi @bodempudi venkat

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help.

We'd love to hear from you.

Thanks!