cancel
Showing results for 
Search instead for 
Did you mean: 
Community Articles
Dive into a collaborative space where members like YOU can exchange knowledge, tips, and best practices. Join the conversation today and unlock a wealth of collective wisdom to enhance your experience and drive success.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ilir_nuredini
by Honored Contributor
  • 2708 Views
  • 2 replies
  • 4 kudos

Apache 4.0

Missed the Apache Spark 4.0 release? It is not just a version bump, it is a whole new level for big data processing. Some of the highlights that really stood out to me:1. SQL just got way more powerful: reusable UDFs, scripting, session variables, an...

apache-4-0.jpg
  • 2708 Views
  • 2 replies
  • 4 kudos
Latest Reply
Advika
Community Manager
  • 4 kudos

Yeah, Spark 4.0 brings powerful enhancements while staying compatible with existing workloads.Thank you for putting this together and highlighting the key updates, @ilir_nuredini.

  • 4 kudos
1 More Replies
nathanielcooley
by New Contributor II
  • 3497 Views
  • 4 replies
  • 0 kudos

Data Modeling

Just got out of a session on Data Modeling using the Data Vault paradigm. Highly recommended to help think through complex data design. Look out for Data Modeling 101 for Data Lakehouse Demystified by Luan Medeiros. 

  • 3497 Views
  • 4 replies
  • 0 kudos
Latest Reply
sridharplv
Valued Contributor II
  • 0 kudos

Hi @BS_THE_ANALYST , please use this link with code for reference :https://www.databricks.com/blog/data-vault-best-practice-implementation-lakehouse

  • 0 kudos
3 More Replies
ilir_nuredini
by Honored Contributor
  • 1707 Views
  • 0 replies
  • 1 kudos

Databricks Asset Bundles

Why Should You Use Databricks Asset Bundles (DABs)?Without proper tooling, Data Engineering and Machine Learning projects can quickly become messy.That is why we recommend leveraging DABs to solve these common challenges:1. Collaboration:Without stru...

dabs.jpg
  • 1707 Views
  • 0 replies
  • 1 kudos
Brahmareddy
by Esteemed Contributor
  • 12858 Views
  • 8 replies
  • 8 kudos

My Journey with Schema Management in Databricks

When I first started handling schema management in Databricks, I realized that a little bit of planning could save me a lot of headaches down the road. Here’s what I’ve learned and some simple tips that helped me manage schema changes effectively. On...

  • 12858 Views
  • 8 replies
  • 8 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 8 kudos

Haha, glad it made sense! Joao.Try it out, and if you run into any issues, just let me know. Always happy to help! And best friends? You got it!

  • 8 kudos
7 More Replies
CURIOUS_DE
by Valued Contributor
  • 1953 Views
  • 2 replies
  • 6 kudos

🔐 How Do I Prevent Users from Accidentally Deleting Tables in Unity Catalog? 🔐

Question:I have a role called dev-dataengineer with the following privileges on the catalog dap_catalog_dev:APPLY TAGCREATE FUNCTIONCREATE MATERIALIZED VIEWCREATE TABLECREATE VOLUMEEXECUTEREAD VOLUMEREFRESHSELECTUSE SCHEMAWRITE VOLUMEDespite this, u...

  • 1953 Views
  • 2 replies
  • 6 kudos
Latest Reply
nayan_wylde
Esteemed Contributor II
  • 6 kudos

Managing assets in UC is always a overhead maintenance. We have this access controls in terraform codes and it is always hard to see what level of access is given to different personas in the org. We are building an audit dashboard for it.

  • 6 kudos
1 More Replies
shraddha_09
by New Contributor II
  • 2075 Views
  • 1 replies
  • 1 kudos

Databricks Optimization Tips – What’s Your Secret?

When I first started working with Databricks, I was genuinely impressed by its potential. The seamless integration with Delta Lake, the power of PySpark, and the ability to process massive datasets at incredible speeds—it was truly impactful.Over tim...

  • 2075 Views
  • 1 replies
  • 1 kudos
Latest Reply
chanukya-pekala
Contributor III
  • 1 kudos

1. Try to remove cache() and persist() in the dataframe operations in the code base.2. Fully avoid driver operations like collect() and take() - the information from the executors are brought back to driver, which is highly network i/o overhead.3. Av...

  • 1 kudos
prasannac
by New Contributor
  • 811 Views
  • 0 replies
  • 0 kudos

Request for a guest post

Hi,   I hope you're doing well. My name is Prasanna. C, Digital Marketing Strategist at Express Analytics, a company that understands consumer behavior and provides analytics solutions and services to businesses.   Express Analytics primarily offers...

  • 811 Views
  • 0 replies
  • 0 kudos
lucami
by Contributor
  • 2041 Views
  • 2 replies
  • 1 kudos

Automatic Liquid Clustering and PO

I spent some time to understand how to use automatic liquid clustering with dlt pipelines. Hope this can help you as well.Enable Predictive Optimization Use this code:# Enabling Automatic Liquid Clustering on a new table @dlt.table(cluster_by_auto=Tr...

  • 2041 Views
  • 2 replies
  • 1 kudos
Latest Reply
lucami
Contributor
  • 1 kudos

Hi @Addy0_, thanks for sharing how to set it for existing table. Unfortunately, I think ALTER cannot be used with materialized view and streaming tables defined in dlt pipelines.I was looking for something similar to @dlt.table(cluster_by_auto=True, ...

  • 1 kudos
1 More Replies
thedatanerd
by New Contributor III
  • 1037 Views
  • 0 replies
  • 1 kudos

Databricks Data Classification

I encourage you to try out a new beta feature in Databricks called : Data Classification. It automatically classifies your catalog data and tag it with tags. Docs: https://docs.databricks.com/aws/en/lakehouse-monitoring/data-classification

  • 1037 Views
  • 0 replies
  • 1 kudos
xdx001
by New Contributor III
  • 1019 Views
  • 0 replies
  • 1 kudos

Strong Databricks Fundamental - Gen Z

Why Databricks is the Future of Data Analytics for Gen ZIn the fast-paced world of data analytics, staying ahead of the curve is crucial. For Gen Z, who are digital natives and always on the lookout for the latest tech trends, understanding the diffe...

  • 1019 Views
  • 0 replies
  • 1 kudos
ThomazRossito
by Contributor
  • 3952 Views
  • 1 replies
  • 1 kudos

Post: Lakehouse Federation - Databricks

Lakehouse Federation - Databricks In the world of data, innovation is constant. And the most recent revolution comes with Lakehouse Federation, a fusion between data lakes and data warehouses, taking data manipulation to a new level. This advancement...

Community Articles
data engineer
Lakehouse
SQL Analytics
  • 3952 Views
  • 1 replies
  • 1 kudos
Latest Reply
Freshman
New Contributor III
  • 1 kudos

Hey Quick Question, Can we use it for the production version ? We have application server as SQL server, we are planning to use lakehouse federation so we can bypass creating and maintaining 100 of workflows. as we a small dataset I am not too sure o...

  • 1 kudos
Shahram
by New Contributor II
  • 1194 Views
  • 0 replies
  • 1 kudos

Hub Star Modeling 2.0 for Medalion Architecture

Excited to share my latest publication on arXiv!“Hub Star Modeling 2.0 for Medallion Architecture” https://arxiv.org/abs/2504.08788This new version builds on the original Hub Star Modeling approach, published last year, and now tailored for the Meda...

  • 1194 Views
  • 0 replies
  • 1 kudos
genevive_mdonça
by Databricks Employee
  • 3959 Views
  • 1 replies
  • 6 kudos

Handling Complex Nested JSON in Databricks Using schemaHints

When I first got into managing schemas in Databricks, it took me a while to realize that putting in a little planning up front could save me a ton of headaches later on.I was working with these deeply nested, constantly changing JSON files. At first,...

  • 3959 Views
  • 1 replies
  • 6 kudos
Latest Reply
Advika
Community Manager
  • 6 kudos

Great tip @genevive_mdonça! schemaHints help avoid issues with evolving JSON data, making data processing more reliable and easier to maintain. Thanks for sharing.

  • 6 kudos
techgeorge
by New Contributor III
  • 3102 Views
  • 1 replies
  • 0 kudos

Understanding Coalesce, Skewed Joins, and Why AQE Doesn't Always Intervene

In Spark, data skew can be the silent killer of performance. One wide partition pulling in 90% of the data?But even with AQE (Adaptive Query Execution) turned on in Databricks, skewness isn't always automatically identified— and here’s why.What Is co...

Data Skew.png
  • 3102 Views
  • 1 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

@mark_ott , this question seems right up your alley. Care to comment?

  • 0 kudos
Yuki
by Contributor
  • 2584 Views
  • 0 replies
  • 1 kudos

One of the solution of [FAILED_READ_FILE.NO_HINT] Error while reading file, when display() or SELECT

One of the solution of [FAILED_READ_FILE.NO_HINT] Error while reading file, when display() or SELECTI got stuck with the above error when using `spark.read.table().display()` or directly query the table using %sql.While the display method is just one...

  • 2584 Views
  • 0 replies
  • 1 kudos
Labels