Dive into a collaborative space where members like YOU can exchange knowledge, tips, and best practices. Join the conversation today and unlock a wealth of collective wisdom to enhance your experience and drive success.
Organizations solved the challenge of collecting, cleaning & governing structured data at scale via Delta Lake and Unity Catalog in Lakehouse. You have world class lineage, permissions, RBAC, ABAC and schemas as the nervous system. The nervous system...
A Data & AI–Driven Decision Engine for Modern Retail NetworksIntroductionIn modern retail, supply chains are no longer static networks — they are living, adaptive systems that must continuously respond to customer demand, fulfillment speed expectatio...
We need to stop treating AI as a tool. It's time to treat it as a peer.I've been building a library of reusable skills for Claude — structured instructions that let AI agents handle complex, repetitive development workflows on Databricks and Azure AI...
Most construction teams don’t really have a data problem, at least not in the way we usually think about it. They already have dashboards everywhere. Finance has reports, project managers have schedule views, field teams have inspection logs. Everyon...
Combining SIGNAL statement with ATOMIC transactions in Databricks saves us from managing commits & rollbacks along with managing custom validations seamlessly - something that modern big data ETL frameworks struggle to deliver cleanly. They give the ...
I've spent years migrating SOC operations from traditional SIEM to Databricks. Not because it's trendy, but because SIEM has fundamental problems that no vendor update will fix: proprietary query languages that lock you in, no version control or test...
Most Databricks streaming failures don't look dramatic.No cluster termination. No red wall of errors. The UI says RUNNING — and your customers start reporting nonsense.I wrote about the incident that changed how we think about streaming jobs on share...
Completely agree, production war stories are worth more than any documentation. I’ve eaten enough teeth on production data lake issues to write my own chapter on what can go wrong, whether that’s deploying Databricks in financial institutions or bein...
Zerobus went GA on February 23rd. Connector ecosystem: empty. I run NiFi for security telemetry so I built the processor myself. Apache 2.0, source on GitHub.NiFi uses NAR packaging — each archive gets its own classloader. The Zerobus Java SDK is JNI...
Databricks introduces multi-table transactions, allowing operations across multiple Delta tables to execute as a single atomic unit. Delta Lake has provided ACID guarantees at the table level, but ensuring atomicity across multiple tables previously ...
Part 2 of 3 — Databricks Streaming ArchitectureThe instinct after Part 1 was obvious.If running eight queries in one task means one failure can hide while others keep running — split them into multiple tasks. Separate concerns. Give each component it...
Hi everyone,I recently wrote an article on designing an enterprise-scale data platform architecture using Azure and Databricks.The article covers:• End-to-end architecture for enterprise data platforms• Data ingestion using Azure Data Factory and Kaf...
Databricks ABAC lets you apply a single schema-level policy across columns of any data type — no more managing one mask function per type. Here's how to use the VARIANT data type to make it work.
If you've implemented column masking in Unity Catalog,...
Part 3 of 3: Databricks Streaming ArchitectureBy the end of Part 1 & Part 2, we knew what the real answer was. We just hadn’t committed to it yet.Not because it wouldn’t work. We tested it. We documented it. The code was ready. The answer was one clu...
Apache Hudi and Delta Lake are built for different workloads. Hudi is optimised for high-frequency writes; Delta Lake is built for fast, reliable reads. Using one format across the entire data platform forces an unnecessary trade-off high ingestion c...
As enterprises race toward cloud-native data platforms, modernising legacy ETL pipelines remains one of the most persistent bottlenecks. For organizations that have relied on SQL Server Integration Services (SSIS) for years, rewriting hundreds of pac...