Dive into a collaborative space where members like YOU can exchange knowledge, tips, and best practices. Join the conversation today and unlock a wealth of collective wisdom to enhance your experience and drive success.
Zerobus went GA on February 23rd. Connector ecosystem: empty. I run NiFi for security telemetry so I built the processor myself. Apache 2.0, source on GitHub.NiFi uses NAR packaging — each archive gets its own classloader. The Zerobus Java SDK is JNI...
Databricks introduces multi-table transactions, allowing operations across multiple Delta tables to execute as a single atomic unit. Delta Lake has provided ACID guarantees at the table level, but ensuring atomicity across multiple tables previously ...
Part 2 of 3 — Databricks Streaming ArchitectureThe instinct after Part 1 was obvious.If running eight queries in one task means one failure can hide while others keep running — split them into multiple tasks. Separate concerns. Give each component it...
Hi everyone,I recently wrote an article on designing an enterprise-scale data platform architecture using Azure and Databricks.The article covers:• End-to-end architecture for enterprise data platforms• Data ingestion using Azure Data Factory and Kaf...
Databricks ABAC lets you apply a single schema-level policy across columns of any data type — no more managing one mask function per type. Here's how to use the VARIANT data type to make it work.
If you've implemented column masking in Unity Catalog,...
Part 3 of 3: Databricks Streaming ArchitectureBy the end of Part 1 & Part 2, we knew what the real answer was. We just hadn’t committed to it yet.Not because it wouldn’t work. We tested it. We documented it. The code was ready. The answer was one clu...
Apache Hudi and Delta Lake are built for different workloads. Hudi is optimised for high-frequency writes; Delta Lake is built for fast, reliable reads. Using one format across the entire data platform forces an unnecessary trade-off high ingestion c...
As enterprises race toward cloud-native data platforms, modernising legacy ETL pipelines remains one of the most persistent bottlenecks. For organizations that have relied on SQL Server Integration Services (SSIS) for years, rewriting hundreds of pac...
Hi everyone,I just published a new article in my Medium. This article explores an important topic: Designing reliable data pipelines in Databricks.Many pipelines fail not because of code, but because of design decisions made early in development. In ...
Microsoft announced the retirement plan for the Azure Databricks Standard tier. This is vital information for Organizations still on the Standard Tier. It represents a fundamental architectural realignment that Organizations must navigate with precis...
I've created an Azure Resource Graph query that identifies all standard tier Databricks in your environment (assuming you have read access)https://github.com/cjpluta/azretirementqueries/blob/main/queries/databricks-standard.kql
Hi all.If you've ever manually promoted resources from dev to prod on Databricks — copying notebooks, updating configs, hoping nothing breaks — this post is for you.I've been building a CI/CD setup for a Speech-to-Text pipeline on Databricks, and I w...
Hi,
Great question! Databricks Asset Bundles (DABs) are the recommended approach for CI/CD on Databricks. Here is a comprehensive walkthrough.
WHAT ARE DATABRICKS ASSET BUNDLES?
DABs let you define your Databricks resources (jobs, pipelines, dashboar...
Works for any event-driven workload: IoT alerts, e-commerce flash sales, financial market close processing.GoalIn this project, I needed to start Databricks jobs on an irregular basis, driven entirely by timestamps stored in PostgreSQL rather than by...
@PiotrPustola -- The self-rescheduling orchestrator pattern is a really elegant solution for event-driven workloads that depend on externally managed timestamps. A few thoughts and additions that might help you and others who land on this article:
AD...
Databricks Community Fellows February 2026 Recap
The Databricks Community Fellows are internal Brickster experts who volunteer their time to help customers succeed by answering questions in the Databricks Community forums.
This month: 92 customer que...
If you’ve ever needed to maintain historical truth in a data warehouse, you’ve likely bumped into Slowly Changing Dimensions (SCD)—specifically Type 2. In SCD2, we keep every version of a record as it changes over time, so analysis can answer questio...
Discussed the BI & Metrics Tax elimination using Databricks Metric Views here. Semantic Layer is a core component of the lakehouse with Metric Views. Modern stack is moving toward ai data experiences where organizations ask questions instead of build...