Dive into a collaborative space where members like YOU can exchange knowledge, tips, and best practices. Join the conversation today and unlock a wealth of collective wisdom to enhance your experience and drive success.
Part 2 of my series on building an enterprise data platform on Databricks — this one's about Silver.Part 1 covered why we ran two ingestion paths in parallel (GoldenGate CDC + JDBC batch) and kept them as separate bronze tables. If you missed it:http...
One of my learning from a project.After migrating from an on-premises environment to the cloud, the data engineering team began noticing seemingly random failures in workflow-scheduled Databricks jobs.The failures appeared intermittent and often succ...
Body:Every day, data platforms generate thousands of audit events. But here's the problem: security teams are drowning in noise.Critical risks hide in plain sight. Manual investigations take hours. Compliance gaps surface too late. And there's no int...
Building an Incremental Customer Data Migration Workflow in DatabricksBy Naveen AyallaIntroductionIn many enterprise environments, customer data is spread across legacy systems that were originally designed for operational processing rather than mode...
Great write-up, Naveen. Very practical and clear.I really like how you focused not just on migration, but on building a reliable incremental workflow with proper duplicate handling and governance. That’s where real value comes from.Also, connecting D...
How to extract Tableau calculated fields, dimensions, and measures from a .twbx workbook and re-express them as a production-grade Databricks Metric View YAML — with the Sample Superstore dataset as a complete worked example, accelerated by AI coding...
Hi all,Tired of paying the data movement tax or wrestling with complex manual pipeline configs?I just published a new Medium article and open-sourced a framework that fully automates Databricks Lakeflow Connect pipelines for CDC-enabled databases usi...
Over the years, I have helped organizations design and deliver large-scale data platforms, and one recurring lesson has remained constant: CDC failures are rarely caused by technology alone. They are usually the result of unclear ownership, missing o...
The ProblemLiving in Japan means getting handed receipts everywhere — convenience stores, pharmacies, restaurants. Most end up in a pocket or trash, never tracked, and the coupons go unused.The SolutionSysl is a PWA that scans any Japanese receipt au...
Hey everyone!For the DAIS 2026 Community Virtual Challenge, I built a LEGO Value Engine using Databricks Free Edition.This is a passion project that combined my interests of both LEGOs and Data Engineering.When a new LEGO set releases, it can be hard...
Hi Databricks Community,I built a retail sales forecasting system on Databricks Free Edition using the Rossmann Store Sales dataset — about 1,115 stores with daily sales over two and a half years. The goal was a 48-day forecast, the same horizon as t...
I recently built an end-to-end data pipeline architecture in the transportation domain, focusing on city and trip data. The pipeline follows the Bronze–Silver–Gold layered approach, where raw data is ingested into the Bronze layer, cleaned and standa...
Hey everyone,Have you ever opened Databricks Catalog Explorer to audit a table, only to find the downstream job listed as "Untitled"? Databricks Unity Catalog is incredibly powerful for automated lineage, but it quietly breaks the moment you orchestr...
Hey everyone,We’ve all been there: a Delta Lake MERGE job that should take 20 minutes drags on for 90 minutes, while a full overwrite of the same table finishes in under 20. When an overwrite outpaces a selective merge, it's a massive red flag that y...
Today, we’re celebrating a very special milestone: the São Paulo Databricks User Group has reached 500 members.More than just a number, these are 500 professionals united by a shared passion for data, AI, analytics, data engineering, and innovation. ...
Hello Everyone, As a Data & Analytics Engineer with experience spanning ETL, data engineering, solution design, and data platform engineering, I currently work Azure Data Ecosystem involving Azure Databricks, Terraform, and CI/CD pipelines — building...