cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

aonurdemir
by Contributor
  • 88 Views
  • 2 replies
  • 2 kudos

Broken s3 file paths in File Notifications for auto loader

Suddenly at "2025-10-23T14:12:48.409+00:00", coming file paths from file notification queue started to be urlencoded. Hence, our pipeline gets file not found exception. I think something has changed suddenly and broke notification system. Here are th...

  • 88 Views
  • 2 replies
  • 2 kudos
Latest Reply
K_Anudeep
Databricks Employee
  • 2 kudos

Hello @aonurdemir, Could you please re-run your pipeline now and check? This issue should be mitigated now. It is due to a recent internal bug that led to the unexpected handling of file paths with special characters. You should set ignoreMissingFile...

  • 2 kudos
1 More Replies
kfoster
by Contributor
  • 5913 Views
  • 8 replies
  • 7 kudos

Azure DevOps Repo - Invalid Git Credentials

I have a Repo in Databricks connected to Azure DevOps Repositories.The repo has been working fine for almost a month, until last week. Now when I try to open the Git settings in Databricks, I am getting "Invalid Git Credentials". Nothing has change...

  • 5913 Views
  • 8 replies
  • 7 kudos
Latest Reply
klaas
New Contributor II
  • 7 kudos

I had a similar problem. I could fix following these steps:in the Azure Devops repository: User Settings -> Personal access tokens  -> + New tokenin Databricks: Settings -> User -> Linked accounts -> Azure Devops (Personal access token)You could also...

  • 7 kudos
7 More Replies
Hsn
by Visitor
  • 11 Views
  • 0 replies
  • 0 kudos

Suggest about data engineer

Hey, I'm Hasan Sayyed, currently pursuing SYBCA. I want to become a Data Engineer, but as a beginner, I’ve wasted some time learning other languages and technologies due to a lack of proper knowledge about this field. If someone could guide and teach...

  • 11 Views
  • 0 replies
  • 0 kudos
Dhruv-22
by Contributor II
  • 176 Views
  • 5 replies
  • 2 kudos

Reading empty json file in serverless gives error

I ran a databricks notebook to do incremental loads from files in raw layer to bronze layer tables. Today, I encountered a case where the delta file was empty. I tried running it manually on the serverless compute and encountered an error.df = spark....

  • 176 Views
  • 5 replies
  • 2 kudos
Latest Reply
K_Anudeep
Databricks Employee
  • 2 kudos

Hello @Dhruv-22 , Can you share the schema of the df? Do you have a _corrupt_record column in your dataframe? If yes.. where are you getting it from, because you said its an empty file correct?As per the design ,Spark blocks queries that only referen...

  • 2 kudos
4 More Replies
whatever
by New Contributor
  • 702 Views
  • 1 replies
  • 0 kudos

broken file API and inconsistent behavior

Since there is no way to file a bug, I'll post it here.. Honestly, I haven't seen such a broken and inconsistent API from production system yet in my life..what is worse - this same issue is in 'os' module:And their UI (despite actually showing the f...

whatever_0-1753367689463.png whatever_0-1753368641377.png whatever_1-1753368764667.png
  • 702 Views
  • 1 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

Hi @whatever,  Thanks for sharing this. I will test this and report internally, meanwhile you can also submit a new idea/request/bug using this portal from your end: https://docs.databricks.com/en/resources/ideas.html#create-an-idea-in-the-ideas-port...

  • 0 kudos
Rainier_dw
by New Contributor III
  • 2743 Views
  • 2 replies
  • 0 kudos

, Help Needed: Obtaining and Applying Blade Bridge License for SSIS-to-DB SQL Conversion

Hello everyone,I’m in the process of using Blade Bridge to convert my SSIS .dtsx packages into Databricks SQL, but I’ve run into a licensing issue and could use some guidance.What I’m doing:Installed Blade Bridge and followed the required folder stru...

  • 2743 Views
  • 2 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

Hi @Rainier_dw @Eric_Kieft ,  https://github.com/databrickslabs/lakebridge/issues/1819 is now tracked under https://github.com/databrickslabs/lakebridge/issues/1836 as an enhancement for product and https://github.com/databrickslabs/lakebridge/pull/1...

  • 0 kudos
1 More Replies
rajanchaturvedi
by New Contributor
  • 1993 Views
  • 2 replies
  • 0 kudos

Executors getting killed while Scaling Spark jobs on GPU using RAPIDS(NVIDIA)

Hi Team , I want to take advantage of Spark Distribution over GPU clusters using RAPID(NVIDIA) , everything is setup 1. The Jar is loaded correctly via Init script , the jar is downloaded and uploaded on volume (workspace is unity enabled) and via In...

rajanchaturvedi_0-1750067083816.png rajanchaturvedi_1-1750067171780.png rajanchaturvedi_2-1750067287042.png
  • 1993 Views
  • 2 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

Also try to gradually reduce spark.executor.memory You need to allocate less memory to the JVM heap because the GPU needs a large chunk of the node's off-heap (system) memory. The GPU memory is allocated outside the JVM heap. If the heap is too large...

  • 0 kudos
1 More Replies
toproximahk
by New Contributor II
  • 164 Views
  • 4 replies
  • 1 kudos

Inquiry on GraphFrame Library Upgrade Timeline for Databricks Runtime for Machine Learning

Thanks for the Databricks community and maintaining such a valuable platform.I would like to inquire if there is a planned timeline for upgrading the GraphFrame library. We’ve noticed that the latest release on GitHub is v0.9.3, while the Databricks ...

  • 164 Views
  • 4 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

Greeting @toproximahk ,  thanks for the kind words and for the detailed pointers.   What’s in Databricks Runtime 17.3 LTS ML today The preinstalled GraphFrames JAR in Databricks Runtime 17.3 LTS for Machine Learning is org.graphframes:graphframes_2.1...

  • 1 kudos
3 More Replies
gudurusreddy99
by New Contributor II
  • 197 Views
  • 4 replies
  • 1 kudos

Resolved! Databricks DLT Joins: Streaming table join with Delta table is reading 2 Billion records per batch

Databricks DLT Joins: Streaming table join with Delta table is reading 2 Billion records from Delta Table for each and every Micro batch.How to overcome this issue to not to read 2 Billion records for every micro batch.Your suggestions and feedback w...

  • 197 Views
  • 4 replies
  • 1 kudos
Latest Reply
ManojkMohan
Honored Contributor
  • 1 kudos

@gudurusreddy99  Any update here , did you try the above solutions ?

  • 1 kudos
3 More Replies
adam_mich
by New Contributor II
  • 3344 Views
  • 16 replies
  • 0 kudos

How to Pass Data to a Databricks App?

I am developing a Databricks application using the Streamlit package. I was able to get a "hello world" app deployed successfully, but now I am trying to pass data that exists in the dbfs on the same instance. I try to read a csv saved to the dbfs bu...

  • 3344 Views
  • 16 replies
  • 0 kudos
Latest Reply
old_student
New Contributor II
  • 0 kudos

I used Azure Blob Storage, and this resolved the issue. Our app now contains Python files in the Databricks environment app that have access to Azure Blob Storage using Azure credentials.

  • 0 kudos
15 More Replies
VikasSinha
by New Contributor
  • 6270 Views
  • 4 replies
  • 0 kudos

Which is better - Azure Databricks or GCP Databricks?

Which cloud hosting environment is best to use for Databricks? My question pins down to the fact that there must be some difference between the latency, throughput, result consistency & reproducibility between different cloud hosting environments of ...

  • 6270 Views
  • 4 replies
  • 0 kudos
Latest Reply
Riyakh
New Contributor II
  • 0 kudos

Both Azure Databricks and GCP Databricks offer powerful capabilities, but Azure Databricks is generally preferred for tighter enterprise integration, while GCP Databricks excels in flexibility and cost-efficiency. The best choice depends on your orga...

  • 0 kudos
3 More Replies
Dhruv-22
by Contributor II
  • 80 Views
  • 6 replies
  • 0 kudos

BUG - withColumns in pyspark doesn't handle empty dictionary

Today, while reading a delta load my notebook failed and I wanted to report a bug. The withColumns command does not tolerate an empty dictionary and gives the following error in PySpark.flat_tuple = namedtuple("flat_tuple", ["old_col", "new_col", "lo...

  • 80 Views
  • 6 replies
  • 0 kudos
Latest Reply
K_Anudeep
Databricks Employee
  • 0 kudos

Hello @Dhruv-22 ,  I have tested this internally, and this seems to be a bug with the new Serverless env version 4  As a solution, you can try switching the version to 3 as shown bleow and re-run the above code, and it should work. 

  • 0 kudos
5 More Replies
a_user12
by New Contributor III
  • 88 Views
  • 1 replies
  • 1 kudos

Resolved! Drop Delta Log seems not to be working

 I have a delta table where I set the following propertylogRetentionDuration: "interval 1 days"I was doing some table operations and see in the _delta_log folder files such as00000000000000000000.json 00000000000000000001.json 00000000000000000002.js...

  • 88 Views
  • 1 replies
  • 1 kudos
Latest Reply
K_Anudeep
Databricks Employee
  • 1 kudos

Hello @a_user12 ,deltaLogRetentionDuration is the interval after which the delta log files will be removed from the delta log. Delta Lake adheres to a set of internal rules to clean up the delta log when the retention duration is exceeded. Setting de...

  • 1 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels