Community Articles

by RiyazAliM • Honored Contributor

07-18-2025 5:39:45 AM

899 Views
1 replies
4 kudos

The Databricks Python SDK

The Databricks SDK is a script (written in Python, in our case) which lets you control and automate actions on Databricks using the methods available in the WorkSpaceClient (more about this below).Why do we need Databricks SDK:- Automation: You can d...

Community Articles

Reply

899 Views
1 replies
4 kudos

07-18-2025 5:39:45 AM

View Replies

Latest Reply

sridharplv
Valued Contributor II

07-18-2025 5:48:51 AM

4 kudos

Good Article @RiyazAliM.

4 kudos

07-18-2025 5:48:51 AM

by ilir_nuredini • Honored Contributor

07-16-2025 4:18:29 PM

2224 Views
2 replies
4 kudos

Apache 4.0

Missed the Apache Spark 4.0 release? It is not just a version bump, it is a whole new level for big data processing. Some of the highlights that really stood out to me:1. SQL just got way more powerful: reusable UDFs, scripting, session variables, an...

Community Articles

Reply

2224 Views
2 replies
4 kudos

07-16-2025 4:18:29 PM

View Replies

Latest Reply

Advika
Databricks Employee

07-17-2025 2:24:56 AM

4 kudos

Yeah, Spark 4.0 brings powerful enhancements while staying compatible with existing workloads.Thank you for putting this together and highlighting the key updates, @ilir_nuredini.

4 kudos

07-17-2025 2:24:56 AM

1 More Replies

by Harun • Honored Contributor

06-08-2024 8:20:11 AM

8380 Views
3 replies
2 kudos

Optimizing Costs in Databricks by Dynamically Choosing Cluster Sizes

Databricks is a popular unified data analytics platform known for its powerful data processing capabilities and seamless integration with Apache Spark. However, managing and optimizing costs in Databricks can be challenging, especially when it comes ...

Community Articles

Reply

8380 Views
3 replies
2 kudos

06-08-2024 8:20:11 AM

View Replies

Latest Reply

kmacgregor
New Contributor II

03-12-2025 8:21:44 AM

2 kudos

How can this actually be used to choose a cluster pool for a Databricks workflow dynamically, that is, at run time? In other words, what can you actually do with the value of `selected_pool` other than printing it out?

2 kudos

03-12-2025 8:21:44 AM

2 More Replies

by nathanielcooley • New Contributor II

06-10-2025 1:21:26 PM

2901 Views
4 replies
0 kudos

Data Modeling

Just got out of a session on Data Modeling using the Data Vault paradigm. Highly recommended to help think through complex data design. Look out for Data Modeling 101 for Data Lakehouse Demystified by Luan Medeiros.

Community Articles

Reply

2901 Views
4 replies
0 kudos

06-10-2025 1:21:26 PM

View Replies

Latest Reply

sridharplv
Valued Contributor II

07-12-2025 11:56:50 AM

0 kudos

Hi @BS_THE_ANALYST , please use this link with code for reference :https://www.databricks.com/blog/data-vault-best-practice-implementation-lakehouse

0 kudos

07-12-2025 11:56:50 AM

3 More Replies

by ilir_nuredini • Honored Contributor

07-11-2025 5:58:22 PM

1038 Views
0 replies
1 kudos

Databricks Asset Bundles

Why Should You Use Databricks Asset Bundles (DABs)?Without proper tooling, Data Engineering and Machine Learning projects can quickly become messy.That is why we recommend leveraging DABs to solve these common challenges:1. Collaboration:Without stru...

Community Articles

Reply

1038 Views
0 replies
1 kudos

07-11-2025 5:58:22 PM

by SumitSingh • Contributor II

07-19-2024 8:25:47 AM

8427 Views
11 replies
41 kudos

From Associate to Professional: My Learning Plan to ace all Databricks Data Engineer Certifications

In today’s data-driven world, the role of a data engineer is critical in designing and maintaining the infrastructure that allows for the efficient collection, storage, and analysis of large volumes of data. Databricks certifications holds significan...

Community Articles

Reply

8427 Views
11 replies
41 kudos

07-19-2024 8:25:47 AM

View Replies

Latest Reply

BS_THE_ANALYST
Esteemed Contributor III

07-11-2025 8:06:28 AM

41 kudos

@SumitSingh this is getting put in the favourites. Thanks a bunch for this All the best,BS

41 kudos

07-11-2025 8:06:28 AM

10 More Replies

by Brahmareddy • Esteemed Contributor

08-12-2024 1:28:15 PM

8185 Views
8 replies
7 kudos

My Journey with Schema Management in Databricks

When I first started handling schema management in Databricks, I realized that a little bit of planning could save me a lot of headaches down the road. Here’s what I’ve learned and some simple tips that helped me manage schema changes effectively. On...

Community Articles

Reply

8185 Views
8 replies
7 kudos

08-12-2024 1:28:15 PM

View Replies

Latest Reply

Brahmareddy
Esteemed Contributor

03-19-2025 8:26:19 PM

7 kudos

Haha, glad it made sense! Joao.Try it out, and if you run into any issues, just let me know. Always happy to help! And best friends? You got it!

7 kudos

03-19-2025 8:26:19 PM

7 More Replies

by CURIOUS_DE • Contributor III

06-10-2025 9:14:50 PM

1139 Views
2 replies
6 kudos

🔐 How Do I Prevent Users from Accidentally Deleting Tables in Unity Catalog? 🔐

Question:I have a role called dev-dataengineer with the following privileges on the catalog dap_catalog_dev:APPLY TAGCREATE FUNCTIONCREATE MATERIALIZED VIEWCREATE TABLECREATE VOLUMEEXECUTEREAD VOLUMEREFRESHSELECTUSE SCHEMAWRITE VOLUMEDespite this, u...

Community Articles

Reply

1139 Views
2 replies
6 kudos

06-10-2025 9:14:50 PM

View Replies

Latest Reply

nayan_wylde
Esteemed Contributor

07-01-2025 12:23:23 PM

6 kudos

Managing assets in UC is always a overhead maintenance. We have this access controls in terraform codes and it is always hard to see what level of access is given to different personas in the org. We are building an audit dashboard for it.

6 kudos

07-01-2025 12:23:23 PM

1 More Replies

by shraddha_09 • New Contributor II

05-10-2025 6:58:04 AM

1153 Views
1 replies
1 kudos

Databricks Optimization Tips – What’s Your Secret?

When I first started working with Databricks, I was genuinely impressed by its potential. The seamless integration with Delta Lake, the power of PySpark, and the ability to process massive datasets at incredible speeds—it was truly impactful.Over tim...

Community Articles

Reply

1153 Views
1 replies
1 kudos

05-10-2025 6:58:04 AM

View Replies

Latest Reply

chanukya-pekala
Contributor III

06-13-2025 5:48:34 AM

1 kudos

1. Try to remove cache() and persist() in the dataframe operations in the code base.2. Fully avoid driver operations like collect() and take() - the information from the executors are brought back to driver, which is highly network i/o overhead.3. Av...

1 kudos

06-13-2025 5:48:34 AM

by prasannac • New Contributor

06-11-2025 8:04:06 AM

637 Views
0 replies
0 kudos

Request for a guest post

Hi, I hope you're doing well. My name is Prasanna. C, Digital Marketing Strategist at Express Analytics, a company that understands consumer behavior and provides analytics solutions and services to businesses. Express Analytics primarily offers...

Community Articles

Reply

637 Views
0 replies
0 kudos

06-11-2025 8:04:06 AM

by mai_luca • New Contributor III

06-10-2025 2:26:00 AM

1243 Views
2 replies
1 kudos

Automatic Liquid Clustering and PO

I spent some time to understand how to use automatic liquid clustering with dlt pipelines. Hope this can help you as well.Enable Predictive Optimization Use this code:# Enabling Automatic Liquid Clustering on a new table @dlt.table(cluster_by_auto=Tr...

Community Articles

Reply

1243 Views
2 replies
1 kudos

06-10-2025 2:26:00 AM

View Replies

Latest Reply

mai_luca
New Contributor III

06-10-2025 4:51:18 AM

1 kudos

Hi @Addy0_, thanks for sharing how to set it for existing table. Unfortunately, I think ALTER cannot be used with materialized view and streaming tables defined in dlt pipelines.I was looking for something similar to @dlt.table(cluster_by_auto=True, ...

1 kudos

06-10-2025 4:51:18 AM

1 More Replies

by thedatanerd • New Contributor III

06-10-2025 12:20:42 AM

643 Views
0 replies
1 kudos

Databricks Data Classification

I encourage you to try out a new beta feature in Databricks called : Data Classification. It automatically classifies your catalog data and tag it with tags. Docs: https://docs.databricks.com/aws/en/lakehouse-monitoring/data-classification

Community Articles

Reply

643 Views
0 replies
1 kudos

06-10-2025 12:20:42 AM

by xdx001 • New Contributor III

05-22-2025 7:00:16 AM

714 Views
0 replies
1 kudos

Strong Databricks Fundamental - Gen Z

Why Databricks is the Future of Data Analytics for Gen ZIn the fast-paced world of data analytics, staying ahead of the curve is crucial. For Gen Z, who are digital natives and always on the lookout for the latest tech trends, understanding the diffe...

Community Articles

Reply

714 Views
0 replies
1 kudos

05-22-2025 7:00:16 AM

by MichTalebzadeh • Valued Contributor

04-27-2024 12:00:43 AM

3258 Views
3 replies
0 kudos

Financial Crime detection with the help of Apache Spark, Data Mesh and Data Lake

For those interested in Data Mesh and Data Lakes for FinCrime detection:Data mesh is a relatively new architectural concept for data management that emphasizes domain-driven data ownership and self-service data availability. It promotes the decentral...

Community Articles

data lakes

Data Mesh

financial crime

spark

Reply

3258 Views
3 replies
0 kudos

04-27-2024 12:00:43 AM

View Replies

Latest Reply

carrolbeau
New Contributor II

05-07-2025 3:00:58 AM

0 kudos

It's great that you're focusing on financial crime detection with advanced technologies like Apache Spark, Data Mesh, and Data Lake. For those looking to dive deeper into criminal records and related data, tools like KY criminal lookup can provide es...

0 kudos

05-07-2025 3:00:58 AM

2 More Replies

by ThomazRossito • Contributor

04-14-2024 4:31:33 PM

3418 Views
1 replies
1 kudos

Post: Lakehouse Federation - Databricks

Lakehouse Federation - Databricks In the world of data, innovation is constant. And the most recent revolution comes with Lakehouse Federation, a fusion between data lakes and data warehouses, taking data manipulation to a new level. This advancement...

Community Articles

data engineer

Lakehouse

SQL Analytics

Reply

3418 Views
1 replies
1 kudos

04-14-2024 4:31:33 PM

View Replies

Latest Reply

Freshman
New Contributor III

05-05-2025 8:08:56 PM

1 kudos

Hey Quick Question, Can we use it for the production version ? We have application server as SQL server, we are planning to use lakehouse federation so we can bypass creating and maintaining 100 of workflows. as we a small dataset I am not too sure o...

1 kudos

05-05-2025 8:08:56 PM

Databricks Community

Forum Posts

The Databricks Python SDK

Apache 4.0

Optimizing Costs in Databricks by Dynamically Choosing Cluster Sizes

Data Modeling

Databricks Asset Bundles

From Associate to Professional: My Learning Plan to ace all Databricks Data Engineer Certifications

My Journey with Schema Management in Databricks

🔐 How Do I Prevent Users from Accidentally Deleting Tables in Unity Catalog? 🔐

Databricks Optimization Tips – What’s Your Secret?

Request for a guest post

Automatic Liquid Clustering and PO

Databricks Data Classification

Strong Databricks Fundamental - Gen Z

Financial Crime detection with the help of Apache Spark, Data Mesh and Data Lake

Post: Lakehouse Federation - Databricks

Join Us as a Local Community Builder!

Building an End-to-End ETL Pipeline with Data from...

My First Month Learning Databricks - Key Takeaways...

Unity Catalog Migration Strategy

🚀 Boost Databricks Performance ✅ Lazy Evaluation ...

🚀 DataFrame Caching on Delta Tables - What if und...