cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Seeking Advice on Data Lakehouse Architecture with Databricks

joshbuttler
New Contributor

I'm currently designing a data lakehouse architecture using Databricks and have a few questions. What are the best practices for efficiently ingesting both batch and streaming data into Delta Lake? Any recommended tools or approaches?

1 REPLY 1

szymon_dybczak
Contributor

Hi @joshbuttler,

I think the best way is to use auto loader, which  provides a highly efficient way to incrementally process new data, while also guaranteeing each file is processed exactly once.
It supports ingestion in a batch mode (Trigger.AvailableNow()) and you can also load data in streaming manner (under the hood it's using spark structured streaming). You have native support for variety of source files like JSON, PARQUET, CSV, XML to name a few  and also integration with streaming data sources like Kafka, Kinesis or EventHub.

What is Auto Loader? - Azure Databricks | Microsoft Learn

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group