cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results for 
Search instead for 
Did you mean: 

CDC / Event Driven Data Ingestion

Mey
New Contributor II

Hello Guys,

I am planning to implement Event Driven Data Ingestion from Bronze -> Silver -> Gold layer in my project. Currently we are having batch processing approach for our data ingestion pipelines. We have decided to move away from batch process to Event Driven approach. Can some one guide me / throw some light on architectural design, steps / key factors I have to capture before design the CDC / Event Driven Scalable architecture. Also it would be good if you provide some examples or documents as a sample for my verification.

Thanks for all the helps so far!!!

 

1 ACCEPTED SOLUTION

Accepted Solutions

bianca_unifeye
Databricks MVP

Moving from batch → event-driven / CDC on Databricks usually means adopting streaming + incremental processing across the Bronze → Silver → Gold (Medallion) layers.

Key design factors to capture upfront

  • Event source: Kafka / Event Hubs / Kinesis / Debezium / app events

  • CDC strategy: source-side CDC vs Delta Change Data Feed (CDF)

  • Exactly-once & ordering: idempotent writes, keys, watermarking

  • Schema evolution: schema enforcement vs evolution at Bronze

  • Data quality: quarantine bad records early (expectations)

  • Scalability & recovery: checkpoints, replay, backfills

  • Latency vs cost: micro-batch vs continuous triggers

  • Governance: Unity Catalog, lineage, access controls

Typical Databricks pattern

  • Bronze: Event ingestion using Auto Loader / streaming, raw append

  • Silver: Apply CDC using Delta Live Tables (apply_changes) or Delta CDF

  • Gold: Incremental aggregates / serving tables (streaming or triggered)

Recommended Databricks documentation

This combination gives you event-driven ingestion, scalable CDC, built-in data quality, and recoverability while staying fully aligned with Databricks best practices.

View solution in original post

3 REPLIES 3

bianca_unifeye
Databricks MVP

Moving from batch → event-driven / CDC on Databricks usually means adopting streaming + incremental processing across the Bronze → Silver → Gold (Medallion) layers.

Key design factors to capture upfront

  • Event source: Kafka / Event Hubs / Kinesis / Debezium / app events

  • CDC strategy: source-side CDC vs Delta Change Data Feed (CDF)

  • Exactly-once & ordering: idempotent writes, keys, watermarking

  • Schema evolution: schema enforcement vs evolution at Bronze

  • Data quality: quarantine bad records early (expectations)

  • Scalability & recovery: checkpoints, replay, backfills

  • Latency vs cost: micro-batch vs continuous triggers

  • Governance: Unity Catalog, lineage, access controls

Typical Databricks pattern

  • Bronze: Event ingestion using Auto Loader / streaming, raw append

  • Silver: Apply CDC using Delta Live Tables (apply_changes) or Delta CDF

  • Gold: Incremental aggregates / serving tables (streaming or triggered)

Recommended Databricks documentation

This combination gives you event-driven ingestion, scalable CDC, built-in data quality, and recoverability while staying fully aligned with Databricks best practices.

Mey
New Contributor II

Thanks @bianca_unifeye ..let me collect all these points and come back at the time of design..

KartikBhatnagar
New Contributor III

Hi Mey,

Please also consider databrick file arrival trigger for your event driven data ingestion journey.

https://docs.databricks.com/aws/en/jobs/file-arrival-triggers

Regards, Kartik 

 

Kartik bhatnagar