Databricks Advent Calendar 2025 #20
As Unity Catalog becomes an enterprise catalog, bring-your-own lineage is one of my favorite features.
- 474 Views
- 0 replies
- 1 kudos
As Unity Catalog becomes an enterprise catalog, bring-your-own lineage is one of my favorite features.
In 2025, Metrics Views are becoming the standard way to define business logic once and reuse it everywhere. Instead of repeating complex SQL, teams can work with clean, consistent metrics.
Automatic file retention in the autoloader is one of my favourite new features of 2025. Automatically move cloud files to cold storage or just delete.
Thanks for sharing @Hubert-Dudek ! That's a really great feature. It simplified a lot data maintenance process at one of my clients
I teach Databricks to all sorts of folks, coders, managers—everyone. What’s wild is how two companies with the same setup can have totally different experiences. Usually, it’s not the tech itself that’s the issue, but how people see it. Databricks tr...
As someone who benefited from Louis Training, I can attest that it makes a difference to constantly keep up to date and work on the foundation of understanding.Especially when things move so fast as in our industry, the time to reflect and improve pa...
AI/BI dashboards now support cross-filtering, which allows you to click on an element in one chart to filter and update related data in other charts.Cross-filtering allows users to interactively explore relationships and patterns across multiple visu...
There does appear to now be a list of capsules indicating the filters applied along the top of Databricks AI/BI Dashboards. The capsules appear to include filter-selectors and also cross-filters added by clicking charts.Also, there is a now "Reset t...
Replacing records for the entire date with newly arriving data for the given date is a typical design pattern. Now, thanks to simple REPLACE USING in Databricks, it is easier than ever!
Real-time mode is a breakthrough that lets Spark utilize all available CPUs to process records with single-millisecond latency, while decoupling checkpointing from per-record processing.
For many data engineers who love PySpark, the most significant improvement of 2025 was the addition of merge to the dataframe API, so no more Delta library or SQL is needed to perform MERGE. p.s. I still prefer SQL MERGE inside spark.sql()
New Lakakebase experience is a game-changer for transactional databases. That functionality is fantastic. Autoscaling to zero makes it really cost-effective. Do you need to deploy to prod? Just branch the production database to the release branch, an...
I've been working with Unity Catalog's lineage capabilities for a while now, and I have to say—this is what lineage should have always been. Not a separate tool to configure. Not a manual process to maintain. Just automatic, real-time visibility into...
I have been using and implementing UC in various workspaces across industry, BYOL is the one I am really looking forward to implement next.Thanks @AbhaySingh for consolidating it here.
Ingestion from SharePoint is now available directly in PySpark. Just define a connection and use spark-read or, even better, spark-readStream with an autoloader. Just specify the file type and options for that file (pdf, csv, Excel, etc.)
Excel The big news this week is the possibility of native importing Excel files. Write operations are also possible. There is a possibility of choosing a data range. It also works with the streaming autoloader, currently in beta. GPT 5.2 The same day...
ZeroBus changes the game: you can now push event data directly into Databricks, even from on-prem. No extra event layer needed. Every Unity Catalog table can act as an endpoint.
All leading LLMs are available natively in Databricks: - ChatGPT 5.2 from the day of the premiere! - System catalog with AI schema in Unity Catalog has multiple LLMs ready to serve! - OpenAI, Gemini, and Anthropic are available side by side!
Databricks goes native on Excel. You can now ingest + query .xls/.xlsx directly in Databricks (SQL + PySpark, batch and streaming), with auto schema/type inference, sheet + cell-range targeting, and evaluated formulas, no extra libraries anymore.
| User | Count |
|---|---|
| 85 | |
| 71 | |
| 47 | |
| 44 | |
| 42 |