cancel
Showing results for 
Search instead for 
Did you mean: 
Community Articles
Dive into a collaborative space where members like YOU can exchange knowledge, tips, and best practices. Join the conversation today and unlock a wealth of collective wisdom to enhance your experience and drive success.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Shahram
by New Contributor II
  • 666 Views
  • 0 replies
  • 1 kudos

Hub Star Modeling 2.0 for Medalion Architecture

Excited to share my latest publication on arXiv!“Hub Star Modeling 2.0 for Medallion Architecture” https://arxiv.org/abs/2504.08788This new version builds on the original Hub Star Modeling approach, published last year, and now tailored for the Meda...

  • 666 Views
  • 0 replies
  • 1 kudos
genevive_mdonça
by Databricks Employee
  • 1877 Views
  • 1 replies
  • 6 kudos

Handling Complex Nested JSON in Databricks Using schemaHints

When I first got into managing schemas in Databricks, it took me a while to realize that putting in a little planning up front could save me a ton of headaches later on.I was working with these deeply nested, constantly changing JSON files. At first,...

  • 1877 Views
  • 1 replies
  • 6 kudos
Latest Reply
Advika
Databricks Employee
  • 6 kudos

Great tip @genevive_mdonça! schemaHints help avoid issues with evolving JSON data, making data processing more reliable and easier to maintain. Thanks for sharing.

  • 6 kudos
techgeorge
by New Contributor III
  • 1269 Views
  • 1 replies
  • 0 kudos

Understanding Coalesce, Skewed Joins, and Why AQE Doesn't Always Intervene

In Spark, data skew can be the silent killer of performance. One wide partition pulling in 90% of the data?But even with AQE (Adaptive Query Execution) turned on in Databricks, skewness isn't always automatically identified— and here’s why.What Is co...

Data Skew.png
  • 1269 Views
  • 1 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

@mark_ott , this question seems right up your alley. Care to comment?

  • 0 kudos
Yuki
by Contributor
  • 1535 Views
  • 0 replies
  • 1 kudos

One of the solution of [FAILED_READ_FILE.NO_HINT] Error while reading file, when display() or SELECT

One of the solution of [FAILED_READ_FILE.NO_HINT] Error while reading file, when display() or SELECTI got stuck with the above error when using `spark.read.table().display()` or directly query the table using %sql.While the display method is just one...

  • 1535 Views
  • 0 replies
  • 1 kudos
techgeorge
by New Contributor III
  • 711 Views
  • 0 replies
  • 0 kudos

How to train a Convolutional Neural Network on Databricks with Tensorflow and Keras

Here is how to trained a lightweight Convolutional Neuronal Network (CNN) to detect pneumonia from chest X-rays pictures on Azure Databricks. I promise no LLMs, no hype, just real-world deep learning:1. Built it with TensorFlow & Keras on Databricks2...

techgeorge_0-1743756172384.png
  • 711 Views
  • 0 replies
  • 0 kudos
shubham_meshram
by New Contributor II
  • 1320 Views
  • 0 replies
  • 0 kudos

When Did the Data Go Wrong? Using Delta Lake Time Travel for Investigation in Databricks

I. IntroductionData pipelines are the lifeblood of modern data-driven organizations. However, even the most robust pipelines can experience unexpected issues: data corruption, erroneous updates, or sudden data drops. When these problems occur, quickl...

shubham_meshram_0-1743459167949.png
  • 1320 Views
  • 0 replies
  • 0 kudos
Brahmareddy
by Esteemed Contributor
  • 1572 Views
  • 0 replies
  • 1 kudos

Use Query Patterns to Suggest Indexes Dynamically

Hey folks,Ever notice how a query that used to run super fast suddenly starts dragging? We’ve all been there. As data grows, those little inefficiencies in your SQL start showing up — and they show up hard. That’s where something cool comes in: using...

  • 1572 Views
  • 0 replies
  • 1 kudos
DataDarvish
by New Contributor II
  • 1546 Views
  • 0 replies
  • 1 kudos

Unit Testing for Data Engineering: How to Ensure Production-Ready Data Pipelines

In today’s data-driven world, the success of any business use case relies heavily on trust in the data. This trust is built upon key pillars such as data accuracy, consistency, freshness, and overall quality. When organizations release data into prod...

  • 1546 Views
  • 0 replies
  • 1 kudos
SashankKotta
by Databricks Employee
  • 7415 Views
  • 8 replies
  • 6 kudos

Library Management via Custom Compute Policies and ADF Job Triggering

This guide is intended for those looking to install libraries on a cluster using a Custom Compute Policy and trigger Databricks jobs from an Azure Data Factory (ADF) linked service. While many users rely on init scripts for library installation, it i...

Screenshot 2024-06-16 at 12.34.09 PM.png Screenshot 2024-06-16 at 12.38.33 PM.png
  • 7415 Views
  • 8 replies
  • 6 kudos
Latest Reply
Wojciech_BUK
Valued Contributor III
  • 6 kudos

Hi @hassan2 I had same issue and found solution.When I created POOL i created it as On-demand (not spot) and then policy only worked when I removed  entire section "azure_attributes.spot_bid_max_price" from policy.Looks like "azure_attributes.spot_bi...

  • 6 kudos
7 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels