Data Engineering

Forum Posts

Sorted by:

by tp992 • New Contributor II

03-05-2025 12:59:27 PM

3734 Views
2 replies
0 kudos

Using pyspark databricks UDFs with outside function imports

Problem with minimal exampleThe below minimal example does not run locally with databricks-connect==15.3 but does run within databricks workspace.main.pyfrom databricks.connect import DatabricksSession from module.udf import send_message, send_compl...

Data Engineering

3734 Views
2 replies
0 kudos

03-05-2025 12:59:27 PM

View Replies

Latest Reply

mark_ott
Databricks Employee

2 weeks ago

0 kudos

The core issue is that PySpark UDFs require their entire closure—including any helper functions they call, such as _get_greeting—to be serializable and available on the worker nodes. In Databricks Workspace, the module distribution and packaging are ...

0 kudos

2 weeks ago

1 More Replies

by minhhung0507 • Valued Contributor

02-04-2025 8:06:33 PM

1866 Views
5 replies
1 kudos

DeltaFileNotFoundException: [DELTA_TRUNCATED_TRANSACTION_LOG] Error in Streaming Table with Minimal

Dear Databricks Experts,I am encountering a recurring issue while working with Delta streaming tables in my system. The error message is as follows: com.databricks.sql.transaction.tahoe.DeltaFileNotFoundException: [DELTA_TRUNCATED_TRANSACTION_LOG] gs...

Data Engineering

1866 Views
5 replies
1 kudos

02-04-2025 8:06:33 PM

View Replies

Latest Reply

gbrueckl
Contributor II

2 weeks ago

1 kudos

I would assume it is trying to read v899 because in you read up until v898 in the last [streaming]batch and stored the state in the streaming checkpoint. Now, if you run the code again and continue the stream, it tries to pick up from the first versi...

1 kudos

2 weeks ago

4 More Replies

by GANAPATI_HEGDE • New Contributor III

3 weeks ago

152 Views
2 replies
0 kudos

Unable to configure custom compute for DLT pipeline

I am trying to configure cluster for a pipeline like above, However dlt keeps using the small cluster as usual, how to resolve this?

Data Engineering

152 Views
2 replies
0 kudos

3 weeks ago

View Replies

Latest Reply

GANAPATI_HEGDE
New Contributor III

2 weeks ago

0 kudos

i updated my CLI and deployed the job, still i dont see the clusters updates in pipeline

0 kudos

2 weeks ago

1 More Replies

by sparmar • New Contributor

03-11-2025 1:56:25 AM

3653 Views
1 replies
0 kudos

I am Getting SSLError(SSLEOFError) error while triggering Azure DevOps pipeline from Databricks

While triggering Azure devOps pipleline from Databricks, I am getting below error:An error occurred: HTTPSConnectionPool(host='dev.azure.com', port=443): Max retries exceeded with url: /XXX-devops/XXXDevOps/_apis/pipelines/20250224.1/runs?api-version...

Data Engineering

3653 Views
1 replies
0 kudos

03-11-2025 1:56:25 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

2 weeks ago

0 kudos

The error you’re seeing (SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1147)')) while triggering the Azure DevOps pipeline from Databricks indicates an issue with the SSL/TLS handshake, not the firewall or certificate itself. This is ...

0 kudos

2 weeks ago

by Amit_Dass_Chmp • New Contributor III

03-13-2025 9:14:55 AM

3107 Views
1 replies
0 kudos

query on Databricks Arc :ARC will not work on 13.x or greater runtime

I have a query on Databricks Arc , is this statement true - Databricks Runtime Requirements for implementing Arc:ARC requires Databricks ML Runtime 12.2LTS. ARC will not work on 13.x or greater runtime

Data Engineering

3107 Views
1 replies
0 kudos

03-13-2025 9:14:55 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

2 weeks ago

0 kudos

The statement is true: Databricks Arc requires the Databricks ML Runtime 12.2 LTS and will not work on 13.x or greater runtimes. This requirement is confirmed by multiple Databricks Community discussions and documentation, which specifically state th...

0 kudos

2 weeks ago

by j_h_robinson • New Contributor II

03-14-2025 7:39:09 AM

3216 Views
1 replies
0 kudos

GitHub CI/CD Best Practices

Using GitHub, what are some best-practice CI/CD approaches to use specifically with the silver and gold medallion layers? We want to create the bronze, silver, and gold layers in Databricks notebooks.Also, is using notebooks in production a "best pra...

Data Engineering

3216 Views
1 replies
0 kudos

03-14-2025 7:39:09 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

2 weeks ago

0 kudos

For Databricks projects using the medallion architecture (bronze, silver, gold layers), effective CI/CD strategies on GitHub include strict version control, environment isolation, automated testing and deployments, and careful notebook management—all...

0 kudos

2 weeks ago

by SObiero • New Contributor

03-14-2025 7:26:23 AM

3447 Views
1 replies
0 kudos

Passing Microsoft MFA Auth from Databricks to MSSQL Managed Instance in a Databricks FastAPI App

I have a Databricks App built using FastAPI. Users access this App after authenticating with Microsoft MFA on Databricks Azure Cloud. The App connects to an MSSQL Managed Instance (MI) that also supports Microsoft MFA.I want the authenticated user's ...

Data Engineering

3447 Views
1 replies
0 kudos

03-14-2025 7:26:23 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

2 weeks ago

0 kudos

It is not possible in Databricks to seamlessly pass each authenticated user's Azure/MS identity from a web app running on Databricks to MSSQL MI for per-user MFA authentication, in the way your development code does. This limitation stems from how id...

0 kudos

2 weeks ago

by kanikeom • New Contributor II

03-18-2025 10:52:14 AM

3788 Views
2 replies
2 kudos

Asset Bundle API update issues

I was working on a proof of concept (POC) using the assert bundle. My job configuration in the .yml file worked yesterday, but it threw an error today during a demo to the team.The error was likely due to an update to the Databricks API. After some t...

Data Engineering

3788 Views
2 replies
2 kudos

03-18-2025 10:52:14 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

2 weeks ago

2 kudos

Unexpected breaking changes to APIs—especially from cloud platforms like Databricks—can disrupt projects and demos. Proactively anticipating and rapidly adapting to such updates requires a combination of monitoring, process improvements, and technica...

2 kudos

2 weeks ago

1 More Replies

by jeremy98 • Honored Contributor

03-14-2025 4:49:08 AM

3406 Views
2 replies
0 kudos

if else condition task doubt

Hi community,The if else condition task couldn't be used as real if condition? Seems that if the condition goes to False the entire job will be stop. Is it a right behaviour?

Data Engineering

3406 Views
2 replies
0 kudos

03-14-2025 4:49:08 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

2 weeks ago

0 kudos

In Databricks workflows, the "if-else" condition and depends_on logic do not behave exactly like standard programming if-else statements. If a task depends on another task's outcome and that outcome does not match (for example, the condition is false...

0 kudos

2 weeks ago

1 More Replies

by Carl_B • New Contributor II

03-20-2025 1:35:56 PM

3876 Views
1 replies
0 kudos

ImportError: cannot import name 'override' from 'typing_extensions'

Hello,I'm facing an ImportError when trying to run my OpenAI-based summarization script in.The error message is:ImportError: cannot import name 'override' from 'typing_extensions' (/databricks/python/lib/python3.10/site-packages/typing_extensions.py)...

Data Engineering

3876 Views
1 replies
0 kudos

03-20-2025 1:35:56 PM

View Replies

Latest Reply

mark_ott
Databricks Employee

2 weeks ago

0 kudos

This error is caused by a version mismatch between the OpenAI Python package and the typing_extensions library in your Databricks environment. The 'override' symbol is relatively new and only exists in typing_extensions version 4.5.0 and above; some ...

0 kudos

2 weeks ago

by SQLBob • New Contributor II

05-06-2025 9:17:47 AM

3676 Views
2 replies
0 kudos

Unity Catalog Python UDF to Send Messages to MS Teams

Good Morning All - This didn't seem like such a daunting task until I tried it. Of course, it's my very first function in Unity Catalog. Attached are images of both the UDF and example usage I created to send messages via the Python requests library ...

Data Engineering

3676 Views
2 replies
0 kudos

05-06-2025 9:17:47 AM

View Replies

Latest Reply

mark_ott
Databricks Employee

2 weeks ago

0 kudos

You're encountering a common limitation when trying to use an external HTTP request (like the Python requests library) inside a Unity Catalog UDF in Databricks. While your code is correct for a regular notebook environment, Unity Catalog UDFs (and, s...

0 kudos

2 weeks ago

1 More Replies

by jash281098 • New Contributor II

05-10-2025 6:38:53 PM

3131 Views
2 replies
0 kudos

Issues when adding keystore spark config for pyspark to mongo atlas X.509 connectivity

Step followed - Step1: To add init script that will copy the keystore file in the tmp location.Step2: To add spark config in cluster advance options - spark.driver.extraJavaOptions -Djavax.net.ssl.keyStore=/tmp/keystore.jks -Djavax.net.ssl.keyStorePa...

Data Engineering

3131 Views
2 replies
0 kudos

05-10-2025 6:38:53 PM

View Replies

Latest Reply

mark_ott
Databricks Employee

2 weeks ago

0 kudos

To achieve MongoDB Atlas X.509 connectivity from Databricks using PySpark, the standard keystore configuration may fail due to certificate, configuration, or driver method issues. The recommended approach involves several key steps, including properl...

0 kudos

2 weeks ago

1 More Replies

by der • Contributor II

3 weeks ago

495 Views
6 replies
2 kudos

EXCEL_DATA_SOURCE_NOT_ENABLED Excel data source is not enabled in this cluster

I want to read an Excel xlsx file on DBR 17.3. On the Cluster the library dev.mauch:spark-excel_2.13:4.0.0_0.31.2 is installed. V1 Implementation works fine:df = spark.read.format("dev.mauch.spark.excel").schema(schema).load(excel_file) display(df)V2...

Data Engineering

495 Views
6 replies
2 kudos

3 weeks ago

View Replies

Latest Reply

mmayorga
Databricks Employee

3 weeks ago

2 kudos

hi @der First of all thank you for your patience and for providing more information about your case. Use of ".format("excel")" I replicated equally your cluster config in Azure. Without installing any library, I was able to run and load the xlsx fil...

2 kudos

3 weeks ago

5 More Replies

by GJ2 • New Contributor II

02-17-2025 5:22:02 AM

11458 Views
12 replies
2 kudos

Install the ODBC Driver 17 for SQL Server

Hi,I am not a Data Engineer, I want to connect to ssas. It looks like it can be connected through pyodbc. however looks like I need to install "ODBC Driver 17 for SQL Server" using the following command. How do i install the driver on the cluster an...

Data Engineering

11458 Views
12 replies
2 kudos

02-17-2025 5:22:02 AM

View Replies

Latest Reply

Coffee77
Contributor III

3 weeks ago

2 kudos

If you only need to interact with your cloud SQL database, I recommend you use simple code like displayed below for running select queries. To write would be very similar. Take a look here: https://learn.microsoft.com/en-us/sql/connect/spark/connecto...

2 kudos

3 weeks ago

11 More Replies

by 73334 • New Contributor II

03-04-2025 10:38:40 AM

3978 Views
3 replies
1 kudos

Dedicated Access Mode Interactive Cluster with a Service Principal

Hi, I am wondering if it is possible to set up an interactive cluster set to dedicated access mode and having that user be a machine user?I've tried the cluster creation API, /api/2.1/clusters/create, and set the user name to the service principal na...

Data Engineering

3978 Views
3 replies
1 kudos

03-04-2025 10:38:40 AM

View Replies

Latest Reply

Coffee77
Contributor III

3 weeks ago

1 kudos

It turns out that now is possible to include deployment of interactive and SQL Warehouse clusters with Databricks Asset Bundles, so you can include a YAML file similar to this one to deploy that type of interactive clusters:Definition of Interactive ...

1 kudos

3 weeks ago

2 More Replies

Databricks Community

Forum Posts

Using pyspark databricks UDFs with outside function imports

DeltaFileNotFoundException: [DELTA_TRUNCATED_TRANSACTION_LOG] Error in Streaming Table with Minimal

Unable to configure custom compute for DLT pipeline

I am Getting SSLError(SSLEOFError) error while triggering Azure DevOps pipeline from Databricks

query on Databricks Arc :ARC will not work on 13.x or greater runtime

GitHub CI/CD Best Practices

Passing Microsoft MFA Auth from Databricks to MSSQL Managed Instance in a Databricks FastAPI App

Asset Bundle API update issues

if else condition task doubt

ImportError: cannot import name 'override' from 'typing_extensions'

Unity Catalog Python UDF to Send Messages to MS Teams

Issues when adding keystore spark config for pyspark to mongo atlas X.509 connectivity

EXCEL_DATA_SOURCE_NOT_ENABLED Excel data source is not enabled in this cluster

Install the ODBC Driver 17 for SQL Server

Dedicated Access Mode Interactive Cluster with a Service Principal

Join Us as a Local Community Builder!

Streamed DLT Pipeline using a lookup table

Delta live tables - foreign keys

Inconsistent behaviour when using read_files to re...

SQL Warehouse - Table does not support overwrite b...

Naming question about SQL server database schemas