Dive into a collaborative space where members like YOU can exchange knowledge, tips, and best practices. Join the conversation today and unlock a wealth of collective wisdom to enhance your experience and drive success.
I just logged in to the community edition for the last time and spun up the cluster for the last time. Today is the last day, but it's still there. Haven't logged in there for a while, as the free edition offers much more, but it is a place where man...
This article provides an overview of key Databricks features and best practices that protect Gold tables from accidental deletion. It also covers the implications if both the Gold and Landing layers are deleted without active retention or backup. Cor...
I recently saw a business case in which an external orchestrator accounted for nearly 30% of their total Databricks job costs. That's when it hit me: we're often paying a premium for complexity we don't need. Besides FinOps, I tried to gather all the...
There is a new direct mode in Databricks Asset Bundles: the main difference is that there is no Terraform anymore, and a simple state in JSON. It offers a few significant benefits:
- No requirement to download Terraform and terraform-provider-databr...
See the comments below for a runnable notebookThroughout my career I have worked at several companies that handle sensitive data; including PII, PHI, EMR, HIPPA, Class I/II/III FOMC - Internal (FR). One entity I worked at even required a Department O...
We just failed a HIPAA audit, they asked why our pipelines had patients names in the if the pipeline didnt need that info, they recommended it to be encrypted. We thought S3 encryption was good enough.We implemented row level encryption by extending ...
Stream–stream joins are one of the most powerful features in Databricks Structured Streaming – and also one of the easiest to misconfigure. As soon as you move from simple append-only pipelines to real-time correlations across multiple streams (order...
Databricks is a popular unified data analytics platform known for its powerful data processing capabilities and seamless integration with Apache Spark. However, managing and optimizing costs in Databricks can be challenging, especially when it comes ...
@Second Reply You’re right just printing out selected_pool isn’t enough to actually leverage dynamic cluster sizing at runtime. In practice, the value of selected_pool would feed directly into your Databricks cluster creation API or workflow automati...
Meta Ads is now a native data source in DatabricksDatabricks just announced a Meta Ads connector (Beta) powered by Lakeflow Connect, making it easy to ingest advertising data directly into Databricks—no custom APIs, no CSV exports, no brittle scripts...
Recently, it has not only become difficult to get a quota in some regions, but even if you have one, it doesn't mean that there are available VMs. Even if you have a quota, you may need to move your bundles to a different subscription when different ...
Thanks for sharing, @Hubert-Dudek. This is a common challenge users face, and flexible node types can significantly improve compute launch reliability.
From hardcoded IDs, through lookups, to finally referencing resources. I think almost everyone, including me, wants to go through such a journey with Databricks Asset Bundles. #databricks
In the article below, I am looking at how to reference a resou...
Incrementally upload data from Confluence. I remember there were a few times in my life when I spent weeks on it. Now, it is incredible how simple it is to implement it with Lakeflow Connect. Additionally, I love DABS's first approach for connectors,...
Celebrating platform capabilities, community impact, and responsible adoptionIn 2025, my Databricks journey evolved from mastering features to empowering outcomes.What became clear this year is that Databricks isn’t just a powerful platform — it’s a ...
Nice write-up @Lakshmipriya_Na ,
I really like this framing. The “Builder” vs “Strategist” distinction maps almost perfectly to how Databricks shows up in the real world. You can move fast and iterate in notebooks, but the same Lakehouse naturally n...
Our calendar is coming to an end. One of the most significant innovations of last year is Agent Bricks. We received a few ready-made solutions for deploying agents. As the Agents ecosystem becomes more complex, one of my favourites is the Multi-Agent...
During the last two weeks, five new Lakeflow Connect connectors were announced. It allows incremental ingestion of the data in an easy way. In the coming weeks, there will be more announcements about Lakeflow Connect, and we can expect Databricks to ...
Your stream can have a state, and now, with TransformWithStateInPandas, it’s easy to manage - you can handle things like initial state, deduplication, recovery, etc., with the 2025 improvements.