Technical Blog

Schema Management and Drift Scenarios via Databricks Auto Loader

Data lakes notoriously have had challenges with managing incremental data processing at scale without integrating open table storage format frameworks (i.e. Delta Lake, Apache Iceberg, Apache Hudi). I...

A Data Engineer's Guide to Optimized Streaming with Protobuf and Delta Live Tables

This article describes an example use case where events from multiple games stream through Kafka and terminate in Delta tables. The example illustrates how to use Delta Live Tables (DLT) to: Stream fr...

IDENTIFIER: Turning ticks to backticks since DBR 13.2

Motivation Note: You can find all examples to run here. In past posts, we discussed parameter markers that you can use to templatize queries. Given a simple example table: CREATE OR REPLACE TABLE resi...

Introducing MLOps Gym: Your Practical Guide to MLOps on Databricks

Welcome to the MLOps Gym, where we guide you through the essential steps of implementing MLOps practices on Databricks, ensuring that your machine learning projects move from ad hoc experimentation t...

Lambda functions: Knights of the higher-order

It’s not often that a DBMS surprises me when it comes to SQL; I kind of think I have seen it all. However there is this one feature in Spark SQL that made me go: “Huh! Now that’s cool!” when I first e...

Sharing Hive Metastore Datasets Across Databricks Workspaces

If you are using Databricks to manage your data and haven't fully upgraded to Unity Catalog, you are likely dealing with legacy datasets in the Hive Metastore. While Unity Catalog and Delta Sharing m...

Why Serverless Databricks SQL is the best for BI workloads: Part III - Query Result Cache

Authors: Andrey Mirskiy (@AndreyMirskiy) and Marco Scagliola (@MarcoScagliola) Welcome to part III of our blog series on “Why Databricks SQL Serverless is the best fit for BI workloads”. In part I of ...

Databricks cost analysis and cross charge with Power BI

Authors: Liping Huang (@Liphuan) and Marius Panga (@mariuspc) Introduction Effective cost management is a critical consideration for any cloud data platform. Historically, achieving cost control and i...

Building SQL with SQL: An introduction to EXECUTE IMMEDIATE

Motivation In Databricks, you have many means to compose and execute queries. You can: Incrementally build a query and execute it using the DataFrame APIUse Python, Scala, or some supported other lan...

Basics of Databricks Workflows - Part 3: The Compute Spectrum

By Hari Selvarajan & Sourav Gulati Welcome to the third installment of our blog series exploring Databricks Workflows, a powerful product for orchestrating data processing, machine learning, and analy...

Databricks

Blog Articles