Re: How can I simplify my data ingestion by proces... - Databricks Community - 31473

Register to join the community

Data Engineering

Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.

This post will help you simplify your data ingestion by utilizing Auto Loader, Delta Optimized Writes, Delta Write Jobs, and Delta Live Tables.

Pre-Req:

You are using JSON data and Delta Writes commands

Step 1: Simplify ingestion with Auto Loader

Delta Lake helps unlock the full capabilities of working with JSON data in Databricks. Auto Loader makes it easy to ingest JSON data and manage semi-structured data in the Databricks Lakehouse.

Get hands on and import this notebook for a walkthrough on continuous and scheduled ingest of JSON data with Auto Loader.

If you want to learn more, check out this overview blog and short video, and come back to this post to follow Steps 2-3.

Step 2: Reduce latency by optimizing your writes to Delta tables

Now that you’re using Delta tables, reduce latency when reading by running Auto Optimize to automatically compact small files during individual writes.

Set your table’s properties to

delta.autoOptimize.optimizeWrite = true and delta.autoOptimize.autoCompact = true

in the CREATE TABLE command

Tip: Tables with many active queries and latency requirements (in the order of minutes) benefit most from Auto Optimize.

Find examples here for enabling Auto Optimize on all tables.

Step 3: Set up automated ETL processing

Finally, use Databricks workflows and jobs to author, manage, and orchestrate ingestion of your semi-structured and streaming data.

Here's a quick walkthrough on How to Schedule a Job and Automate a Workload.

Did you know Databricks also provides powerful ETL capabilities with Delta Live Tables (DLT)? With DLT, treat your data as code and apply software engineering best practices like testing, monitoring and documentation to deploy reliable pipelines at scale.

To learn more about DLT...

- Follow the DLT Getting Started Guide

- Download example notebooks

- Join the DLT discussions in the Databricks Community

Congrats you have now optimized your data ingestion to get the most out of your data!

Drop your questions, feedback, and tips below!

1 REPLY 1

This post will help you simplify your data ingestion by utilizing Auto Loader, Delta Optimized Writes, Delta Write Jobs, and Delta Live Tables.

Pre-Req:

You are using JSON data and Delta Writes commands

Step 1: Simplify ingestion with Auto Loader

Delta Lake helps unlock the full capabilities of working with JSON data in Databricks. Auto Loader makes it easy to ingest JSON data and manage semi-structured data in the Databricks Lakehouse.

Get hands on and import this notebook for a walkthrough on continuous and scheduled ingest of JSON data with Auto Loader.

If you want to learn more, check out this overview blog and short video, and come back to this post to follow Steps 2-3.

Step 2: Reduce latency by optimizing your writes to Delta tables

Now that you’re using Delta tables, reduce latency when reading by running Auto Optimize to automatically compact small files during individual writes.

Set your table’s properties to

delta.autoOptimize.optimizeWrite = true and delta.autoOptimize.autoCompact = true

in the CREATE TABLE command

Tip: Tables with many active queries and latency requirements (in the order of minutes) benefit most from Auto Optimize.

Find examples here for enabling Auto Optimize on all tables.

Step 3: Set up automated ETL processing

Finally, use Databricks workflows and jobs to author, manage, and orchestrate ingestion of your semi-structured and streaming data.

Here's a quick walkthrough on How to Schedule a Job and Automate a Workload.

Did you know Databricks also provides powerful ETL capabilities with Delta Live Tables (DLT)? With DLT, treat your data as code and apply software engineering best practices like testing, monitoring and documentation to deploy reliable pipelines at scale.

To learn more about DLT...

- Follow the DLT Getting Started Guide

- Download example notebooks

- Join the DLT discussions in the Databricks Community

Congrats you have now optimized your data ingestion to get the most out of your data!

Drop your questions, feedback, and tips below!

never-displayed

You must be signed in to add attachments

never-displayed

Announcements

🌟 Community Pulse: Your Weekly Roundup! October 31 – November 06, 2025

Free Edition Hackathon

🚀 Announcing the Databricks Data Intelligence Platform Cheat Sheet

Zerobus Ingest in Action: How to Stream Event Data Directly into Your Lakehouse

Find Sensitive Data at Scale with Data Classification in Unity Catalog