Dive into a collaborative space where members like YOU can exchange knowledge, tips, and best practices. Join the conversation today and unlock a wealth of collective wisdom to enhance your experience and drive success.
We needed job_id and run_id in a custom metrics Delta table so we could join to `system.lakeflow.job_run_timeline`. Tried four approaches before finding the one that works on serverless compute.What doesn't workspark.conf.get("spark.databricks.job.id...
Overview
Prompted by a customer question, I wanted to see what was possible in terms of MCP integration into Genie Code, in order to try this out I decided to look at Azure Dev Ops, as it's a common workflow to want to see your tickets alongside the ...
PostgreSQL to Databricks made simpler with Lakeflow Connect (Public Preview).Databricks has introduced a PostgreSQL connector in Lakeflow Connect (Public Preview), enabling ingestion of PostgreSQL data into the Lakehouse using logical replication.Ins...
Organizations solved the challenge of collecting, cleaning & governing structured data at scale via Delta Lake and Unity Catalog in Lakehouse. You have world class lineage, permissions, RBAC, ABAC and schemas as the nervous system. The nervous system...
A Data & AI–Driven Decision Engine for Modern Retail NetworksIntroductionIn modern retail, supply chains are no longer static networks — they are living, adaptive systems that must continuously respond to customer demand, fulfillment speed expectatio...
We need to stop treating AI as a tool. It's time to treat it as a peer.I've been building a library of reusable skills for Claude — structured instructions that let AI agents handle complex, repetitive development workflows on Databricks and Azure AI...
Most construction teams don’t really have a data problem, at least not in the way we usually think about it. They already have dashboards everywhere. Finance has reports, project managers have schedule views, field teams have inspection logs. Everyon...
Combining SIGNAL statement with ATOMIC transactions in Databricks saves us from managing commits & rollbacks along with managing custom validations seamlessly - something that modern big data ETL frameworks struggle to deliver cleanly. They give the ...
How Digital Payment Lending Platforms Can Collaborate with Banks Without Exposing Sensitive Data1. Business Context & Regulatory RealityIn 2020, large Indian fintech platforms faced a unique regulatory constraint: NBFC‑led digital platforms were not ...
This is a solid breakdown of how secure data collaboration can be done without exposing sensitive information. The Clean Room approach really stands out because it shifts the model from data sharing to controlled computation, which is exactly what re...
I've spent years migrating SOC operations from traditional SIEM to Databricks. Not because it's trendy, but because SIEM has fundamental problems that no vendor update will fix: proprietary query languages that lock you in, no version control or test...
Most Databricks streaming failures don't look dramatic.No cluster termination. No red wall of errors. The UI says RUNNING — and your customers start reporting nonsense.I wrote about the incident that changed how we think about streaming jobs on share...
Completely agree, production war stories are worth more than any documentation. I’ve eaten enough teeth on production data lake issues to write my own chapter on what can go wrong, whether that’s deploying Databricks in financial institutions or bein...
Zerobus went GA on February 23rd. Connector ecosystem: empty. I run NiFi for security telemetry so I built the processor myself. Apache 2.0, source on GitHub.NiFi uses NAR packaging — each archive gets its own classloader. The Zerobus Java SDK is JNI...
Databricks introduces multi-table transactions, allowing operations across multiple Delta tables to execute as a single atomic unit. Delta Lake has provided ACID guarantees at the table level, but ensuring atomicity across multiple tables previously ...
Part 2 of 3 — Databricks Streaming ArchitectureThe instinct after Part 1 was obvious.If running eight queries in one task means one failure can hide while others keep running — split them into multiple tasks. Separate concerns. Give each component it...
Hi everyone,I recently wrote an article on designing an enterprise-scale data platform architecture using Azure and Databricks.The article covers:• End-to-end architecture for enterprise data platforms• Data ingestion using Azure Data Factory and Kaf...