Databricks Community

joshbuttler · ‎09-18-2024

I'm currently designing a data lakehouse architecture using Databricks and have a few questions. What are the best practices for efficiently ingesting both batch and streaming data into Delta Lake? Any recommended tools or approaches?

Buyfollowes.NZ 7 brew
nulls brawl ios
wgu student

szymon_dybczak · ‎09-18-2024

Hi @joshbuttler,

I think the best way is to use auto loader, which provides a highly efficient way to incrementally process new data, while also guaranteeing each file is processed exactly once.
It supports ingestion in a batch mode (Trigger.AvailableNow()) and you can also load data in streaming manner (under the hood it's using spark structured streaming). You have native support for variety of source files like JSON, PARQUET, CSV, XML to name a few and also integration with streaming data sources like Kafka, Kinesis or EventHub.

What is Auto Loader? - Azure Databricks | Microsoft Learn

Databricks Community

Seeking Advice on Data Lakehouse Architecture with Databricks

Join Us as a Local Community Builder!

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples

Level Up with Databricks Specialist Sessions

🌟 Community Pulse: Your Weekly Roundup! November 07 – 13, 2025

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐