Dive into a collaborative space where members like YOU can exchange knowledge, tips, and best practices. Join the conversation today and unlock a wealth of collective wisdom to enhance your experience and drive success.
If you’ve ever needed to maintain historical truth in a data warehouse, you’ve likely bumped into Slowly Changing Dimensions (SCD)—specifically Type 2. In SCD2, we keep every version of a record as it changes over time, so analysis can answer questio...
Discussed the BI & Metrics Tax elimination using Databricks Metric Views here. Semantic Layer is a core component of the lakehouse with Metric Views. Modern stack is moving toward ai data experiences where organizations ask questions instead of build...
In today's data-driven world, trust is currency—and that trust starts with quality data governed by strong principles. For one of our client, where we're on a mission to build intelligent enterprises with AI, data isn't just an asset—it's a responsib...
Why Legacy BI Is Reaching Its Limits, And What Comes NextI have always believed that the original goal of digitalization was to make data available and then find better ways to analyze it. For the past two decades, Business Intelligence has followed ...
This article continues a technical deep dive into building large-scale Lakehouse architectures.The original platform processed billions of records across multiple markets and operated under PCI-DSS compliance requirements — a significant engineering ...
Discussed the BI & Metrics Tax elimination using Databricks Metric Views here. Organizations also face an older more persistent tax — the Ingestion Tax.To ingest data from a source like Salesforce or SQL Server into your Lakehouse, you typically stit...
The Hidden Cost of Scaling the LakehouseOver the past few years, many organizations have successfully migrated to Databricks to modernize their data platforms. The Lakehouse architecture has enabled them to unify data engineering, analytics, and AI o...
@Saurabh2406 this is such a rich article and has so many practical takeaways! Congrats!I faced similar challenges in one of my last projects, and I could spend some time building a nice dashboard (using the system.billing tables) that helped us trac...
Recently, I am creating some "self-reminder" videos for helping my long-term poor human memory and maybe to help others. Understand internals of Dataframes, how partitions are related to jobs, stages, shuffles and tasks and, how transformations or a...
LDP Tax Pipeline — Spark Declarative Pipelines on macOS (Without Databricks)Excited to share my latest hands-on implementation of a LakeFlow Declarative Pipeline (LDP) built locally using Apache Spark 4.1 Declarative Pipelines — running entirely on ...
Between 2019 and 2021, we built a large-scale lakehouse on Databricks supporting multi-market payments processing (7B+ transactions/year).If ingestion was complex (covered in Part 1), the Silver layer was even more interesting.Implementing SCD Type 1...
Between 2019 and 2021, we built a multi-market payments data platform on Databricks that now processes more than 7 billion transactions per year across seven markets.Ingestion was by far the most operationally complex layer.To support MongoDB CDC str...
High-performing data organizations succeed when all systems, teams, and processes are aligned toward a shared strategy. Fragmentation — separate tools for storage, governance, analytics, and AI, siloed ownership, redundant pipelines, or inconsistent ...
One of the recent additions to the Databricks ecosystem that caught my attention is Lakebridge, a migration accelerator aimed at legacy ETL and data warehouse workloads.Migration projects are always interesting to discuss because, in practice, they a...
Did you know the Databricks Assistant now supports Agent Skills? If your team has common, repeatable workflows, this feature is one you need to explore.
Skills provide the Databricks Assistant with a specific set of instructions to handle tasks custo...
A Key challenge for Organizations is to ensure that data metrics refer to the same for all teams. If BI logic is scattered across various tools, SQL and notebooks, a metrics tax is levied (multiple dashboards showing different revenues). Databricks M...