Data Engineering

Forum Posts

Sorted by:

by aonurdemir • Contributor

a week ago

125 Views
2 replies
4 kudos

Resolved! Broken s3 file paths in File Notifications for auto loader

Suddenly at "2025-10-23T14:12:48.409+00:00", coming file paths from file notification queue started to be urlencoded. Hence, our pipeline gets file not found exception. I think something has changed suddenly and broke notification system. Here are th...

Data Engineering

125 Views
2 replies
4 kudos

a week ago

View Replies

Latest Reply

K_Anudeep
Databricks Employee

Wednesday

4 kudos

Hello @aonurdemir, Could you please re-run your pipeline now and check? This issue should be mitigated now. It is due to a recent internal bug that led to the unexpected handling of file paths with special characters. You should set ignoreMissingFile...

4 kudos

Wednesday

1 More Replies

by kfoster • Contributor

09-28-2022 8:07:48 AM

5929 Views
8 replies
7 kudos

Azure DevOps Repo - Invalid Git Credentials

I have a Repo in Databricks connected to Azure DevOps Repositories.The repo has been working fine for almost a month, until last week. Now when I try to open the Git settings in Databricks, I am getting "Invalid Git Credentials". Nothing has change...

Data Engineering

5929 Views
8 replies
7 kudos

09-28-2022 8:07:48 AM

View Replies

Latest Reply

klaas
New Contributor II

yesterday

7 kudos

I had a similar problem. I could fix following these steps:in the Azure Devops repository: User Settings -> Personal access tokens -> + New tokenin Databricks: Settings -> User -> Linked accounts -> Azure Devops (Personal access token)You could also...

7 kudos

yesterday

7 More Replies

by whatever • New Contributor

07-24-2025 7:35:29 AM

719 Views
1 replies
0 kudos

broken file API and inconsistent behavior

Since there is no way to file a bug, I'll post it here.. Honestly, I haven't seen such a broken and inconsistent API from production system yet in my life..what is worse - this same issue is in 'os' module:And their UI (despite actually showing the f...

Data Engineering

719 Views
1 replies
0 kudos

07-24-2025 7:35:29 AM

View Replies

Latest Reply

NandiniN
Databricks Employee

Sunday

0 kudos

Hi @whatever, Thanks for sharing this. I will test this and report internally, meanwhile you can also submit a new idea/request/bug using this portal from your end: https://docs.databricks.com/en/resources/ideas.html#create-an-idea-in-the-ideas-port...

0 kudos

Sunday

by Rainier_dw • New Contributor III

05-28-2025 5:38:08 AM

2774 Views
2 replies
0 kudos

, Help Needed: Obtaining and Applying Blade Bridge License for SSIS-to-DB SQL Conversion

Hello everyone,I’m in the process of using Blade Bridge to convert my SSIS .dtsx packages into Databricks SQL, but I’ve run into a licensing issue and could use some guidance.What I’m doing:Installed Blade Bridge and followed the required folder stru...

Data Engineering

2774 Views
2 replies
0 kudos

05-28-2025 5:38:08 AM

View Replies

Latest Reply

NandiniN
Databricks Employee

Sunday

0 kudos

Hi @Rainier_dw @Eric_Kieft , https://github.com/databrickslabs/lakebridge/issues/1819 is now tracked under https://github.com/databrickslabs/lakebridge/issues/1836 as an enhancement for product and https://github.com/databrickslabs/lakebridge/pull/1...

0 kudos

Sunday

1 More Replies

by rajanchaturvedi • New Contributor

06-16-2025 2:49:47 AM

2016 Views
2 replies
0 kudos

Executors getting killed while Scaling Spark jobs on GPU using RAPIDS(NVIDIA)

Hi Team , I want to take advantage of Spark Distribution over GPU clusters using RAPID(NVIDIA) , everything is setup 1. The Jar is loaded correctly via Init script , the jar is downloaded and uploaded on volume (workspace is unity enabled) and via In...

Data Engineering

2016 Views
2 replies
0 kudos

06-16-2025 2:49:47 AM

View Replies

Latest Reply

NandiniN
Databricks Employee

Sunday

0 kudos

Also try to gradually reduce spark.executor.memory You need to allocate less memory to the JVM heap because the GPU needs a large chunk of the node's off-heap (system) memory. The GPU memory is allocated outside the JVM heap. If the heap is too large...

0 kudos

Sunday

1 More Replies

by toproximahk • New Contributor II

a week ago

176 Views
4 replies
1 kudos

Inquiry on GraphFrame Library Upgrade Timeline for Databricks Runtime for Machine Learning

Thanks for the Databricks community and maintaining such a valuable platform.I would like to inquire if there is a planned timeline for upgrading the GraphFrame library. We’ve noticed that the latest release on GitHub is v0.9.3, while the Databricks ...

Data Engineering

176 Views
4 replies
1 kudos

a week ago

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

Sunday

1 kudos

Greeting @toproximahk , thanks for the kind words and for the detailed pointers. What’s in Databricks Runtime 17.3 LTS ML today The preinstalled GraphFrames JAR in Databricks Runtime 17.3 LTS for Machine Learning is org.graphframes:graphframes_2.1...

1 kudos

Sunday

3 More Replies

by gudurusreddy99 • New Contributor II

2 weeks ago

218 Views
4 replies
1 kudos

Resolved! Databricks DLT Joins: Streaming table join with Delta table is reading 2 Billion records per batch

Databricks DLT Joins: Streaming table join with Delta table is reading 2 Billion records from Delta Table for each and every Micro batch.How to overcome this issue to not to read 2 Billion records for every micro batch.Your suggestions and feedback w...

Data Engineering

218 Views
4 replies
1 kudos

2 weeks ago

View Replies

Latest Reply

ManojkMohan
Honored Contributor

2 weeks ago

1 kudos

@gudurusreddy99 Any update here , did you try the above solutions ?

1 kudos

2 weeks ago

3 More Replies

by adam_mich • New Contributor II

12-10-2024 1:47:40 PM

3354 Views
16 replies
0 kudos

How to Pass Data to a Databricks App?

I am developing a Databricks application using the Streamlit package. I was able to get a "hello world" app deployed successfully, but now I am trying to pass data that exists in the dbfs on the same instance. I try to read a csv saved to the dbfs bu...

Data Engineering

3354 Views
16 replies
0 kudos

12-10-2024 1:47:40 PM

View Replies

Latest Reply

old_student
New Contributor II

Sunday

0 kudos

I used Azure Blob Storage, and this resolved the issue. Our app now contains Python files in the Databricks environment app that have access to Azure Blob Storage using Azure credentials.

0 kudos

Sunday

15 More Replies

by Dhruv-22 • Contributor II

Thursday

94 Views
6 replies
3 kudos

BUG - withColumns in pyspark doesn't handle empty dictionary

Today, while reading a delta load my notebook failed and I wanted to report a bug. The withColumns command does not tolerate an empty dictionary and gives the following error in PySpark.flat_tuple = namedtuple("flat_tuple", ["old_col", "new_col", "lo...

Data Engineering

94 Views
6 replies
3 kudos

Thursday

View Replies

Latest Reply

K_Anudeep
Databricks Employee

Friday

3 kudos

Hello @Dhruv-22 , I have tested this internally, and this seems to be a bug with the new Serverless env version 4 As a solution, you can try switching the version to 3 as shown bleow and re-run the above code, and it should work.

3 kudos

Friday

5 More Replies

by a_user12 • New Contributor III

Thursday

107 Views
1 replies
2 kudos

Resolved! Drop Delta Log seems not to be working

I have a delta table where I set the following propertylogRetentionDuration: "interval 1 days"I was doing some table operations and see in the _delta_log folder files such as00000000000000000000.json 00000000000000000001.json 00000000000000000002.js...

Data Engineering

107 Views
1 replies
2 kudos

Thursday

View Replies

Latest Reply

K_Anudeep
Databricks Employee

Saturday

2 kudos

Hello @a_user12 ,deltaLogRetentionDuration is the interval after which the delta log files will be removed from the delta log. Delta Lake adheres to a set of internal rules to clean up the delta log when the retention duration is exceeded. Setting de...

2 kudos

Saturday

by adhi_databricks • Contributor

03-24-2025 11:42:45 PM

3022 Views
1 replies
1 kudos

Size of output data increased 4 times average size.

Hey guys,We have a databricks job, which dumps data in S3 at daily level, and average size of file would be 60GB and file format is ORC, one inner join operation was taking more than 3hrs , when debugged the join was not auto-broadcasted and it was d...

Data Engineering

3022 Views
1 replies
1 kudos

03-24-2025 11:42:45 PM

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

Saturday

1 kudos

Hey @adhi_databricks , I did some digging and have come up with some helpful tips. The significant increase in file size from 60GB to 200GB after implementing broadcast join, despite having identical data, is most likely caused by poor compression ...

1 kudos

Saturday

by crami • New Contributor II

Friday

105 Views
2 replies
1 kudos

Quota Limit Exhausted Error when Creating declarative pipeline

I am trying to develop a declarative pipeline. As per platform policy, I cannot use serverless, reason, I am using asset bundle to create declarative pipeline. In the bundle, I am trying to specify compute for the pipeline. However, I am constantly f...

Data Engineering

105 Views
2 replies
1 kudos

Friday

View Replies

Latest Reply

Khaja_Zaffer
Contributor III

Friday

1 kudos

Hello @crami Good day!!As the error tells. you need to increase the VM size, i know you have enough things in your place but spot fallback + Photon + autoscale triggers the failure. Go to Azure Portal → Subscriptions → Usage + quotasFilter: Provide...

1 kudos

Friday

1 More Replies

by Nidhig • Contributor

3 weeks ago

369 Views
2 replies
1 kudos

Resolved! Conversational Agent App integration with genie in Databricks

Hi,I have recently explore the feature of conversational agent app from marketplace integration with Genie Space.The connection setup went well but I could find sync issue between the app and genie space. Even after multiple deployment I couldn't see...

Data Engineering

369 Views
2 replies
1 kudos

3 weeks ago

View Replies

Latest Reply

HariSankar
Contributor III

3 weeks ago

1 kudos

Hi @Nidhig,This isn’t expected behavior,it usually happens when the app's service principal lacks permissions to access the SQL warehouse, Genie Space, or underlying Unity Catalog tables.Try these fixes:--> SQL Warehouse: Go to Compute -> SQL Warehou...

1 kudos

3 weeks ago

1 More Replies

by Dhruv-22 • Contributor II

Thursday

108 Views
1 replies
2 kudos

Resolved! Reading empty json file in serverless gives error

I have a pipeline which puts json files in a storage location after reading a daily delta load. Today I encountered a case where the file as empty. I tried running the notebook manually using serverless cluster (Environment version 4) and encountered...

Data Engineering

108 Views
1 replies
2 kudos

Thursday

View Replies

Latest Reply

K_Anudeep
Databricks Employee

Friday

2 kudos

Solution provided here: https://community.databricks.com/t5/data-engineering/reading-empty-json-file-in-serverless-gives-error/m-p/137022#M50682

2 kudos

Friday

by dipanjannet • New Contributor II

05-21-2025 2:19:49 AM

2751 Views
3 replies
1 kudos

Anyone using Databricks Query Federation for ETL purpose ?

Hello All,We have a use case to fetch data from a SQL Server wherein we have some tables to consume. This is typically a OLTP setup wherein the comes in a regular interval. Now, as we have Unity Catalog enabled, we are interested in exploring Databr...

Data Engineering

2751 Views
3 replies
1 kudos

05-21-2025 2:19:49 AM

View Replies

Latest Reply

dipanjannet
New Contributor II

05-25-2025 8:08:55 PM

1 kudos

Hello @nikhilj0421 - Thank you for help responding. The question is not about DLT. The Question is what is the use case of Databricks Query Federation? If we plug Query Federation - what are the implications ? What databricks is suggesting for that?

1 kudos

05-25-2025 8:08:55 PM

2 More Replies

Databricks Community

Forum Posts

Resolved! Broken s3 file paths in File Notifications for auto loader

Azure DevOps Repo - Invalid Git Credentials

broken file API and inconsistent behavior

, Help Needed: Obtaining and Applying Blade Bridge License for SSIS-to-DB SQL Conversion

Executors getting killed while Scaling Spark jobs on GPU using RAPIDS(NVIDIA)

Inquiry on GraphFrame Library Upgrade Timeline for Databricks Runtime for Machine Learning

Resolved! Databricks DLT Joins: Streaming table join with Delta table is reading 2 Billion records per batch

How to Pass Data to a Databricks App?

BUG - withColumns in pyspark doesn't handle empty dictionary

Resolved! Drop Delta Log seems not to be working

Size of output data increased 4 times average size.

Quota Limit Exhausted Error when Creating declarative pipeline

Resolved! Conversational Agent App integration with genie in Databricks

Resolved! Reading empty json file in serverless gives error

Anyone using Databricks Query Federation for ETL purpose ?

Join Us as a Local Community Builder!

transformWithStateInPandas. Invalid pickle opcode ...

Course - Introduction to Apache Spark

DLT | Communication lost with driver | Cluster was...

Broken s3 file paths in File Notifications for aut...

Reading empty json file in serverless gives error