This guide shows how to stream CockroachDB data to Databricks using changefeeds, Azure Blob Storage, Unity Catalog, and Delta Lake. You get one platform for governed ingestion (Unity Catalog), exactly-once streaming (Auto Loader), and ACID tables (Delta Lake)—no custom ETL or credential-heavy ingestion jobs. Credentials are for data source prep (changefeed to storage); the ingestion backend is governed by Unity Catalog and resolves schema and primary keys from storage, so the ingestion job does not need source database credentials. Use this when replicating operational data into the lakehouse for analytics, reporting, or audit.

CockroachDB changefeeds write snapshot and CDC as Parquet to Azure; Auto Loader reads them into a streaming DataFrame; and the Medallion pattern (bronze → silver) orders, deduplicates, and merges events. This guide goes beyond staging (a raw dump into one table) to bronze and silver for real-world scenarios: correct insert/update/delete semantics and multi-table consistency.
Multi-table transactional consistency is achieved by using CockroachDB’s HLC and RESOLVED timestamp watermarks as authoritative high-water marks. Each table’s RESOLVED file guarantees all CDC events (including column-family fragments) are complete up to that time. Coordinating ingestion with a shared minimum watermark keeps referential integrity and synchronizes all tables to the same transactional point, avoiding partial transactions and inconsistencies.
Databricks handles SCD Type 1 (latest state via MERGE) and Type 2 (full history), column families, schema evolution, and multiple changefeed formats (Parquet, JSON, Avro, CSV). This guide focuses on **Parquet format** for native Delta Lake integration.