cancel
Showing results for 
Search instead for 
Did you mean: 
Community Articles
Dive into a collaborative space where members like YOU can exchange knowledge, tips, and best practices. Join the conversation today and unlock a wealth of collective wisdom to enhance your experience and drive success.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

RiyazAliM
by Honored Contributor
  • 633 Views
  • 1 replies
  • 4 kudos

The Databricks Python SDK

The Databricks SDK is a script (written in Python, in our case) which lets you control and automate actions on Databricks using the methods available in the WorkSpaceClient (more about this below).Why do we need Databricks SDK:- Automation: You can d...

aayrm5_1-1752841018665.png aayrm5_3-1752841258075.png aayrm5_4-1752841851794.png
  • 633 Views
  • 1 replies
  • 4 kudos
Latest Reply
sridharplv
Valued Contributor II
  • 4 kudos

Good Article @RiyazAliM.

  • 4 kudos
ilir_nuredini
by Honored Contributor
  • 1904 Views
  • 2 replies
  • 4 kudos

Apache 4.0

Missed the Apache Spark 4.0 release? It is not just a version bump, it is a whole new level for big data processing. Some of the highlights that really stood out to me:1. SQL just got way more powerful: reusable UDFs, scripting, session variables, an...

apache-4-0.jpg
  • 1904 Views
  • 2 replies
  • 4 kudos
Latest Reply
Advika
Databricks Employee
  • 4 kudos

Yeah, Spark 4.0 brings powerful enhancements while staying compatible with existing workloads.Thank you for putting this together and highlighting the key updates, @ilir_nuredini.

  • 4 kudos
1 More Replies
Harun
by Honored Contributor
  • 7403 Views
  • 3 replies
  • 2 kudos

Optimizing Costs in Databricks by Dynamically Choosing Cluster Sizes

Databricks is a popular unified data analytics platform known for its powerful data processing capabilities and seamless integration with Apache Spark. However, managing and optimizing costs in Databricks can be challenging, especially when it comes ...

  • 7403 Views
  • 3 replies
  • 2 kudos
Latest Reply
kmacgregor
New Contributor II
  • 2 kudos

How can this actually be used to choose a cluster pool for a Databricks workflow dynamically, that is, at run time? In other words, what can you actually do with the value of `selected_pool` other than printing it out?

  • 2 kudos
2 More Replies
OU_Professor
by New Contributor II
  • 1527 Views
  • 2 replies
  • 1 kudos

Resolved! Community (Legacy) Edition Question

Hello,I have been teaching my Data Warehousing class using the Databricks Community Edition.  With the change to the Databricks Free Edition, there are many aspects of my Community Edition notebooks that no longer work in the free Edition.  Is there ...

  • 1527 Views
  • 2 replies
  • 1 kudos
Latest Reply
Advika
Databricks Employee
  • 1 kudos

Hello @OU_Professor! Access to existing Community Edition accounts will remain available for the rest of the year. However, please note that new users attempting to sign up for Community Edition are now redirected to the Free Edition instead.

  • 1 kudos
1 More Replies
nathanielcooley
by New Contributor II
  • 2684 Views
  • 4 replies
  • 0 kudos

Data Modeling

Just got out of a session on Data Modeling using the Data Vault paradigm. Highly recommended to help think through complex data design. Look out for Data Modeling 101 for Data Lakehouse Demystified by Luan Medeiros. 

  • 2684 Views
  • 4 replies
  • 0 kudos
Latest Reply
sridharplv
Valued Contributor II
  • 0 kudos

Hi @BS_THE_ANALYST , please use this link with code for reference :https://www.databricks.com/blog/data-vault-best-practice-implementation-lakehouse

  • 0 kudos
3 More Replies
ilir_nuredini
by Honored Contributor
  • 818 Views
  • 0 replies
  • 1 kudos

Databricks Asset Bundles

Why Should You Use Databricks Asset Bundles (DABs)?Without proper tooling, Data Engineering and Machine Learning projects can quickly become messy.That is why we recommend leveraging DABs to solve these common challenges:1. Collaboration:Without stru...

dabs.jpg
  • 818 Views
  • 0 replies
  • 1 kudos
SumitSingh
by Contributor II
  • 7155 Views
  • 11 replies
  • 41 kudos

From Associate to Professional: My Learning Plan to ace all Databricks Data Engineer Certifications

In today’s data-driven world, the role of a data engineer is critical in designing and maintaining the infrastructure that allows for the efficient collection, storage, and analysis of large volumes of data. Databricks certifications holds significan...

SumitSingh_0-1721402402230.png SumitSingh_1-1721402448677.png SumitSingh_2-1721402469214.png
  • 7155 Views
  • 11 replies
  • 41 kudos
Latest Reply
BS_THE_ANALYST
Esteemed Contributor
  • 41 kudos

@SumitSingh this is getting put in the favourites. Thanks a bunch for this All the best,BS

  • 41 kudos
10 More Replies
Brahmareddy
by Esteemed Contributor
  • 6913 Views
  • 8 replies
  • 7 kudos

My Journey with Schema Management in Databricks

When I first started handling schema management in Databricks, I realized that a little bit of planning could save me a lot of headaches down the road. Here’s what I’ve learned and some simple tips that helped me manage schema changes effectively. On...

  • 6913 Views
  • 8 replies
  • 7 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 7 kudos

Haha, glad it made sense! Joao.Try it out, and if you run into any issues, just let me know. Always happy to help! And best friends? You got it!

  • 7 kudos
7 More Replies
charlie_wei
by New Contributor III
  • 1498 Views
  • 0 replies
  • 4 kudos

Connecting VS Code and GitHub Copilot to the Databricks Managed MCP Server

Recently, Databricks released a preview version of the Managed MCP Server. Upon seeing this, I immediately wanted to integrate Databricks Genie with VS Code and GitHub Copilot agent mode. Below, I will briefly share the setup process:Step 1: Prepare ...

charlie_wei_0-1751613479319.png charlie_wei_1-1751615208235.png
Community Articles
Generative AI
Genie
MCP
  • 1498 Views
  • 0 replies
  • 4 kudos
CURIOUS_DE
by Contributor III
  • 770 Views
  • 2 replies
  • 6 kudos

🔐 How Do I Prevent Users from Accidentally Deleting Tables in Unity Catalog? 🔐

Question:I have a role called dev-dataengineer with the following privileges on the catalog dap_catalog_dev:APPLY TAGCREATE FUNCTIONCREATE MATERIALIZED VIEWCREATE TABLECREATE VOLUMEEXECUTEREAD VOLUMEREFRESHSELECTUSE SCHEMAWRITE VOLUMEDespite this, u...

  • 770 Views
  • 2 replies
  • 6 kudos
Latest Reply
nayan_wylde
Honored Contributor
  • 6 kudos

Managing assets in UC is always a overhead maintenance. We have this access controls in terraform codes and it is always hard to see what level of access is given to different personas in the org. We are building an audit dashboard for it.

  • 6 kudos
1 More Replies
shraddha_09
by New Contributor II
  • 921 Views
  • 1 replies
  • 1 kudos

Databricks Optimization Tips – What’s Your Secret?

When I first started working with Databricks, I was genuinely impressed by its potential. The seamless integration with Delta Lake, the power of PySpark, and the ability to process massive datasets at incredible speeds—it was truly impactful.Over tim...

  • 921 Views
  • 1 replies
  • 1 kudos
Latest Reply
chanukya-pekala
Contributor II
  • 1 kudos

1. Try to remove cache() and persist() in the dataframe operations in the code base.2. Fully avoid driver operations like collect() and take() - the information from the executors are brought back to driver, which is highly network i/o overhead.3. Av...

  • 1 kudos
prasannac
by New Contributor
  • 531 Views
  • 0 replies
  • 0 kudos

Request for a guest post

Hi,   I hope you're doing well. My name is Prasanna. C, Digital Marketing Strategist at Express Analytics, a company that understands consumer behavior and provides analytics solutions and services to businesses.   Express Analytics primarily offers...

  • 531 Views
  • 0 replies
  • 0 kudos
mai_luca
by New Contributor III
  • 913 Views
  • 2 replies
  • 1 kudos

Automatic Liquid Clustering and PO

I spent some time to understand how to use automatic liquid clustering with dlt pipelines. Hope this can help you as well.Enable Predictive Optimization Use this code:# Enabling Automatic Liquid Clustering on a new table @dlt.table(cluster_by_auto=Tr...

  • 913 Views
  • 2 replies
  • 1 kudos
Latest Reply
mai_luca
New Contributor III
  • 1 kudos

Hi @Addy0_, thanks for sharing how to set it for existing table. Unfortunately, I think ALTER cannot be used with materialized view and streaming tables defined in dlt pipelines.I was looking for something similar to @dlt.table(cluster_by_auto=True, ...

  • 1 kudos
1 More Replies
thedatanerd
by New Contributor II
  • 462 Views
  • 0 replies
  • 1 kudos

Databricks Data Classification

I encourage you to try out a new beta feature in Databricks called : Data Classification. It automatically classifies your catalog data and tag it with tags. Docs: https://docs.databricks.com/aws/en/lakehouse-monitoring/data-classification

  • 462 Views
  • 0 replies
  • 1 kudos
DavidOBrien
by New Contributor
  • 10152 Views
  • 5 replies
  • 2 kudos

Editing value of widget parameter within notebook code

I have a notebook with a text widget where I want to be able to edit the value of the widget within the notebook and then reference it in SQL code. For example, assuming there is a text widget named Var1 that has input value "Hello", I would want to ...

  • 10152 Views
  • 5 replies
  • 2 kudos
Latest Reply
Ville_Leinonen
New Contributor II
  • 2 kudos

It seems that only way to use parameters in sql code block is to use dbutils.widget and you cannot change those parameters without removing widget and setting it up again in code

  • 2 kudos
4 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels