cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Dario Schiraldi : How do I build a data pipeline in Databricks?

darioschiraldi9
New Contributor II

Hey everyone,

I am Dario Schiraldi, working on building a data pipeline in Databricks and would love to get some feedback and suggestions from the community. I want to build a scalable and efficient pipeline that can handle large datasets and possibly integrate with cloud storage like AWS S3 or Azure Blob.

Looking forward to hearing your thoughts and suggestions!

Regards

Dario Schiraldi CEO of Travel Works 

1 ACCEPTED SOLUTION

Accepted Solutions

ilir_nuredini
Honored Contributor

Hello @darioschiraldi9 ,

Happy to hear that that you are exploring Databricks for you work. Here you may find a very detailed and good example on how you can build scalable data pipeline using DLT and  with the flexibility of Spark Streaming and a sophisticated configuration-driven approach. :

https://community.databricks.com/t5/technical-blog/lakeflow-config-driven-framework-a-guide-to-build...

And in this link, if though it is old, you may find some very useful information on architectural level :
https://www.youtube.com/watch?v=9sBdD1G34Mg

Hope that helps. Also if you give more information on your project in terms of technology details we can compile a better suggestion. Thank you!

Best,Ilir

View solution in original post

Superglue Pipelines, a self-serve platform for data analysts at intuit uses a homegrown ETL framework called QuicKETL, a configuration driven framework to define and execute Spark and Presto ETL workflows. Come join us to hear our journey to learn how we scaled our platform leveraging databricks ...
1 REPLY 1

ilir_nuredini
Honored Contributor

Hello @darioschiraldi9 ,

Happy to hear that that you are exploring Databricks for you work. Here you may find a very detailed and good example on how you can build scalable data pipeline using DLT and  with the flexibility of Spark Streaming and a sophisticated configuration-driven approach. :

https://community.databricks.com/t5/technical-blog/lakeflow-config-driven-framework-a-guide-to-build...

And in this link, if though it is old, you may find some very useful information on architectural level :
https://www.youtube.com/watch?v=9sBdD1G34Mg

Hope that helps. Also if you give more information on your project in terms of technology details we can compile a better suggestion. Thank you!

Best,Ilir

Superglue Pipelines, a self-serve platform for data analysts at intuit uses a homegrown ETL framework called QuicKETL, a configuration driven framework to define and execute Spark and Presto ETL workflows. Come join us to hear our journey to learn how we scaled our platform leveraging databricks ...

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now