cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

rajanchaturvedi
by New Contributor
  • 1982 Views
  • 1 replies
  • 0 kudos

Executors getting killed while Scaling Spark jobs on GPU using RAPIDS(NVIDIA)

Hi Team , I want to take advantage of Spark Distribution over GPU clusters using RAPID(NVIDIA) , everything is setup 1. The Jar is loaded correctly via Init script , the jar is downloaded and uploaded on volume (workspace is unity enabled) and via In...

rajanchaturvedi_0-1750067083816.png rajanchaturvedi_1-1750067171780.png rajanchaturvedi_2-1750067287042.png
  • 1982 Views
  • 1 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

hey @rajanchaturvedi ,  Executor termination, especially when scaling a GPU-accelerated job, is almost always due to memory over-allocation (Out Of Memory, or OOM) on the worker nodes, which causes the cluster manager to kill the process. This is exa...

  • 0 kudos
toproximahk
by New Contributor
  • 149 Views
  • 4 replies
  • 0 kudos

Inquiry on GraphFrame Library Upgrade Timeline for Databricks Runtime for Machine Learning

Thanks for the Databricks community and maintaining such a valuable platform.I would like to inquire if there is a planned timeline for upgrading the GraphFrame library. We’ve noticed that the latest release on GitHub is v0.9.3, while the Databricks ...

  • 149 Views
  • 4 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

Greeting @toproximahk ,  thanks for the kind words and for the detailed pointers.   What’s in Databricks Runtime 17.3 LTS ML today The preinstalled GraphFrames JAR in Databricks Runtime 17.3 LTS for Machine Learning is org.graphframes:graphframes_2.1...

  • 0 kudos
3 More Replies
gudurusreddy99
by New Contributor II
  • 194 Views
  • 3 replies
  • 1 kudos

Resolved! Databricks DLT Joins: Streaming table join with Delta table is reading 2 Billion records per batch

Databricks DLT Joins: Streaming table join with Delta table is reading 2 Billion records from Delta Table for each and every Micro batch.How to overcome this issue to not to read 2 Billion records for every micro batch.Your suggestions and feedback w...

  • 194 Views
  • 3 replies
  • 1 kudos
Latest Reply
ManojkMohan
Honored Contributor
  • 1 kudos

@gudurusreddy99  Any update here , did you try the above solutions ?

  • 1 kudos
2 More Replies
Dhruv-22
by Contributor
  • 142 Views
  • 4 replies
  • 2 kudos

Reading empty json file in serverless gives error

I ran a databricks notebook to do incremental loads from files in raw layer to bronze layer tables. Today, I encountered a case where the delta file was empty. I tried running it manually on the serverless compute and encountered an error.df = spark....

  • 142 Views
  • 4 replies
  • 2 kudos
Latest Reply
K_Anudeep
Databricks Employee
  • 2 kudos

Hello @Dhruv-22 , Can you share the schema of the df? Do you have a _corrupt_record column in your dataframe? If yes.. where are you getting it from, because you said its an empty file correct?As per the design ,Spark blocks queries that only referen...

  • 2 kudos
3 More Replies
bhawana-pandey
by New Contributor III
  • 34 Views
  • 0 replies
  • 0 kudos

Looking for reference DABs bundle yaml and resources for Databricks app deployment (FastAPI redirect

Looking for example databricks.yml and bundle resources for deploying a FastAPI Databricks app using DABs from one environment to another. Deployment works but FastAPI redirects to localhost after deployment, though the homepage loads fine. Need refe...

  • 34 Views
  • 0 replies
  • 0 kudos
adam_mich
by New Contributor II
  • 3337 Views
  • 16 replies
  • 0 kudos

How to Pass Data to a Databricks App?

I am developing a Databricks application using the Streamlit package. I was able to get a "hello world" app deployed successfully, but now I am trying to pass data that exists in the dbfs on the same instance. I try to read a csv saved to the dbfs bu...

  • 3337 Views
  • 16 replies
  • 0 kudos
Latest Reply
old_student
New Contributor II
  • 0 kudos

I used Azure Blob Storage, and this resolved the issue. Our app now contains Python files in the Databricks environment app that have access to Azure Blob Storage using Azure credentials.

  • 0 kudos
15 More Replies
VikasSinha
by New Contributor
  • 6265 Views
  • 4 replies
  • 0 kudos

Which is better - Azure Databricks or GCP Databricks?

Which cloud hosting environment is best to use for Databricks? My question pins down to the fact that there must be some difference between the latency, throughput, result consistency & reproducibility between different cloud hosting environments of ...

  • 6265 Views
  • 4 replies
  • 0 kudos
Latest Reply
Riyakh
New Contributor II
  • 0 kudos

Both Azure Databricks and GCP Databricks offer powerful capabilities, but Azure Databricks is generally preferred for tighter enterprise integration, while GCP Databricks excels in flexibility and cost-efficiency. The best choice depends on your orga...

  • 0 kudos
3 More Replies
Dhruv-22
by Contributor
  • 73 Views
  • 6 replies
  • 0 kudos

BUG - withColumns in pyspark doesn't handle empty dictionary

Today, while reading a delta load my notebook failed and I wanted to report a bug. The withColumns command does not tolerate an empty dictionary and gives the following error in PySpark.flat_tuple = namedtuple("flat_tuple", ["old_col", "new_col", "lo...

  • 73 Views
  • 6 replies
  • 0 kudos
Latest Reply
K_Anudeep
Databricks Employee
  • 0 kudos

Hello @Dhruv-22 ,  I have tested this internally, and this seems to be a bug with the new Serverless env version 4  As a solution, you can try switching the version to 3 as shown bleow and re-run the above code, and it should work. 

  • 0 kudos
5 More Replies
a_user12
by New Contributor III
  • 79 Views
  • 1 replies
  • 0 kudos

Resolved! Drop Delta Log seems not to be working

 I have a delta table where I set the following propertylogRetentionDuration: "interval 1 days"I was doing some table operations and see in the _delta_log folder files such as00000000000000000000.json 00000000000000000001.json 00000000000000000002.js...

  • 79 Views
  • 1 replies
  • 0 kudos
Latest Reply
K_Anudeep
Databricks Employee
  • 0 kudos

Hello @a_user12 ,deltaLogRetentionDuration is the interval after which the delta log files will be removed from the delta log. Delta Lake adheres to a set of internal rules to clean up the delta log when the retention duration is exceeded. Setting de...

  • 0 kudos
adhi_databricks
by Contributor
  • 3003 Views
  • 1 replies
  • 0 kudos

Size of output data increased 4 times average size.

Hey guys,We have a databricks job, which dumps data in S3 at daily level, and average size of file would be 60GB and file format is ORC, one inner join operation was taking more than 3hrs , when debugged the join was not auto-broadcasted and it was d...

  • 3003 Views
  • 1 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

Hey @adhi_databricks , I did some digging and have come up with some helpful tips.   The significant increase in file size from 60GB to 200GB after implementing broadcast join, despite having identical data, is most likely caused by poor compression ...

  • 0 kudos
crami
by New Contributor II
  • 58 Views
  • 2 replies
  • 1 kudos

Quota Limit Exhausted Error when Creating declarative pipeline

I am trying to develop a declarative pipeline. As per platform policy, I cannot use serverless, reason, I am using asset bundle to create declarative pipeline. In the bundle, I am trying to specify compute for the pipeline. However, I am constantly f...

crami_1-1761925275134.png crami_0-1761925248664.png crami_2-1761925397717.png
  • 58 Views
  • 2 replies
  • 1 kudos
Latest Reply
Khaja_Zaffer
Contributor III
  • 1 kudos

Hello @crami Good day!!As the error tells. you need to increase the VM size, i know you have enough things in your place but spot fallback + Photon + autoscale triggers the failure.   Go to Azure Portal → Subscriptions → Usage + quotasFilter: Provide...

  • 1 kudos
1 More Replies
Nidhig
by Contributor
  • 339 Views
  • 2 replies
  • 1 kudos

Resolved! Conversational Agent App integration with genie in Databricks

Hi,I have recently explore the feature of conversational agent app from marketplace integration with Genie Space.The connection setup went well but I could find sync issue between the app and genie space. Even after multiple deployment I couldn't see...

  • 339 Views
  • 2 replies
  • 1 kudos
Latest Reply
HariSankar
Contributor III
  • 1 kudos

Hi @Nidhig,This isn’t expected behavior,it usually happens when the app's service principal lacks permissions to access the SQL warehouse, Genie Space, or underlying Unity Catalog tables.Try these fixes:--> SQL Warehouse: Go to Compute -> SQL Warehou...

  • 1 kudos
1 More Replies
Dhruv-22
by Contributor
  • 63 Views
  • 1 replies
  • 0 kudos

Reading empty json file in serverless gives error

I have a pipeline which puts json files in a storage location after reading a daily delta load. Today I encountered a case where the file as empty. I tried running the notebook manually using serverless cluster (Environment version 4) and encountered...

  • 63 Views
  • 1 replies
  • 0 kudos
Latest Reply
K_Anudeep
Databricks Employee
  • 0 kudos

Solution provided here:  https://community.databricks.com/t5/data-engineering/reading-empty-json-file-in-serverless-gives-error/m-p/137022#M50682

  • 0 kudos
dipanjannet
by New Contributor II
  • 2728 Views
  • 3 replies
  • 0 kudos

Anyone using Databricks Query Federation for ETL purpose ?

Hello All,We have a use case to fetch data from a SQL Server wherein we have some tables to consume. This is typically a OLTP setup wherein the comes in a regular interval.  Now, as we have Unity Catalog enabled, we are interested in exploring Databr...

  • 2728 Views
  • 3 replies
  • 0 kudos
Latest Reply
dipanjannet
New Contributor II
  • 0 kudos

Hello @nikhilj0421 - Thank you for help responding. The question is not about DLT. The Question is what is the use case of Databricks Query Federation? If we plug Query Federation - what are the implications ? What databricks is suggesting for that?

  • 0 kudos
2 More Replies
RakeshRakesh_De
by New Contributor III
  • 2301 Views
  • 3 replies
  • 1 kudos

Databricks Free Edition - sql server connector not working-

I am trying to explore New Databricks Free edition but SQL Server connector Ingestion pipeline not able to set up through UI.. Its showing error that --Serverless Compute Must be Enabled for the workspace,But Free Edition only have Serverless Option ...

Data Engineering
FreeEdition
LakeFlow
  • 2301 Views
  • 3 replies
  • 1 kudos
Latest Reply
Saf4Databricks
New Contributor III
  • 1 kudos

Hi @RakeshRakesh_De  The error is misleading. As mentioned in the second row of the table here the gateway runs on classic compute, and the ingestion pipeline runs on serverless compute (mentioned in the third row of the same table linked above). Hop...

  • 1 kudos
2 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels