Data Engineering

Forum Posts

Sorted by:

by Techtic_kush • New Contributor

Friday

86 Views
2 replies
2 kudos

Can’t save results to target table – out-of-memory error

Hi team, I’m processing ~5,000 EMR notes with a Databricks notebook. The job reads from `crc_lakehouse.bronze.emr_notes`, runs SciSpaCy UMLS entity extraction plus a fine-tuned BERT sentiment model per partition, and builds a DataFrame (`df_entities`...

Data Engineering

86 Views
2 replies
2 kudos

Friday

View Replies

Latest Reply

bianca_unifeye
New Contributor III

Monday

2 kudos

You’re right that the behaviour is weird at first glance (“5k rows on a 64 GB cluster and I blow up on write”), but your stack trace is actually very revealing: this isn’t a classic Delta write / shuffle OOM – it’s SciSpaCy/UMLS falling over when loa...

2 kudos

Monday

1 More Replies

by mplang • New Contributor

10-14-2024 8:59:10 AM

4071 Views
3 replies
2 kudos

DLT x UC x Auto Loader

Now that the Directory Listing Mode of Auto Loader is officially deprecated, is there a solution for using File Notification Mode in a DLT pipeline writing to a UC-managed table? My understanding is that File Notification Mode is only available on si...

Data Engineering

autoloader

dlt

4071 Views
3 replies
2 kudos

10-14-2024 8:59:10 AM

View Replies

Latest Reply

Raman_Unifeye
Contributor III

Tuesday

2 kudos

Databricks introduced Managed File Events which completely bypasses the need for the cluster's identity to provision cloud resources, resolving the conflict with the Shared cluster mode.Steps to Implement in DLTEnable File Events on the External Loca...

2 kudos

Tuesday

2 More Replies

by Sainath368 • Contributor

Monday

81 Views
3 replies
2 kudos

Migrating from directory-listing to Autoloader Managed File events

Hi everyone,We are currently migrating from a directory listing-based streaming approach to managed file events in Databricks Auto Loader for processing our data in structured streaming.We have a function that handles structured streaming where we ar...

Data Engineering

81 Views
3 replies
2 kudos

Monday

View Replies

Latest Reply

Raman_Unifeye
Contributor III

Monday

2 kudos

Yes, for your setup, Databricks Auto Loader will create a separate event queue for each independent stream running with the cloudFiles.useManagedFileEvents = true option.As you are running - 1 stream per table, 1 unique directory per stream and 1 uni...

2 kudos

Monday

2 More Replies

by StephenDsouza • New Contributor II

05-23-2024 4:52:29 AM

3093 Views
3 replies
0 kudos

Error during build process for serving model caused by detectron2

Hi All,Introduction: I am trying to register my model on Databricks so that I can serve it as an endpoint. The packages that I need are "torch", "mlflow", "torchvision", "numpy" and "git+https://github.com/facebookresearch/detectron2.git". For this, ...

Data Engineering

3093 Views
3 replies
0 kudos

05-23-2024 4:52:29 AM

View Replies

Latest Reply

StephenDsouza
New Contributor II

05-24-2024 2:54:54 AM

0 kudos

Found an answer!Basically pip was somehow installed the dependencies from the git repo first and was not following the given order so in order to solve this, I added the libraries for conda to install.``` conda_env = { "channels": [ "defa...

0 kudos

05-24-2024 2:54:54 AM

2 More Replies

by shashankB • New Contributor III

Friday

142 Views
5 replies
0 kudos

Lakebridge analyzer not able to determine DDL.

Databricks analyzer does not shows any DDL statement count, I've also tested with just a simple SELECT * query (SELECT * FROM SCHEMA_NAME.TABLE_NAME;) . Is there any solution for this ?My target was to get a detailed analysis on SnowSQL code. Any h...

Data Engineering

142 Views
5 replies
0 kudos

Friday

View Replies

Latest Reply

Thompson2345
New Contributor II

Sunday

0 kudos

The Lakebridge analyzer counts DDL statements, not regular queries. A simple SELECT * is DML, not DDL, so it won’t show up in the DDL count.To get meaningful results for SnowSQL code analysis:Include actual DDL statements like CREATE TABLE, ALTER TAB...

0 kudos

Sunday

4 More Replies

by EDDatabricks • Contributor

09-12-2024 5:31:33 AM

4122 Views
1 replies
1 kudos

Schema Registry certificate auth with Unity Catalog volumes.

Greetings.We currently have a Spark structured streaming job (Scala) retrieving avro data from an Azure Eventhub with a confluent schema registry endpoint (using an Azure Api Management gateway with certificate authentication).Until now the .jks file...

Data Engineering

4122 Views
1 replies
1 kudos

09-12-2024 5:31:33 AM

View Replies

Latest Reply

stbjelcevic
Databricks Employee

Monday

1 kudos

Thanks for the detailed context—here’s a concise, actionable troubleshooting plan tailored to Databricks with Unity Catalog volumes and Avro + Confluent Schema Registry over APIM with mTLS. What’s likely going wrong Based on your description, the ini...

1 kudos

Monday

by Sega2 • New Contributor III

09-26-2024 1:50:41 AM

4680 Views
2 replies
0 kudos

Adding a message to azure service bus

I am trying to send a message to a service bus in azure. But I get following error:ServiceBusError: Handler failed: DefaultAzureCredential failed to retrieve a token from the included credentials.This is the line that fails: credential = DefaultAzure...

Data Engineering

4680 Views
2 replies
0 kudos

09-26-2024 1:50:41 AM

View Replies

Latest Reply

stbjelcevic
Databricks Employee

Monday

0 kudos

It looks like the issue is with the Azure credential chain rather than Service Bus itself; in Databricks notebooks, DefaultAzureCredential won’t succeed unless there’s a valid identity available (env vars, CLI login, managed identity, or a Databricks...

0 kudos

Monday

1 More Replies

by Miguel_Salas • New Contributor II

10-21-2024 12:05:48 PM

4955 Views
2 replies
0 kudos

How Install Pyrfc into AWS Databrick using Volumes

I'm trying to install Pyrfc in a Databricks Cluster (already tried in r5.xlarge, m5.xlarge, and c6gd.xlarge). I'm following these link.https://community.databricks.com/t5/data-engineering/how-can-i-cluster-install-a-c-python-library-pyrfc/td-p/8118Bu...

Data Engineering

4955 Views
2 replies
0 kudos

10-21-2024 12:05:48 PM

View Replies

Latest Reply

stbjelcevic
Databricks Employee

Monday

0 kudos

Thanks for the details. The PyRFC package is a Python binding around the SAP NetWeaver RFC SDK and requires the SAP NW RFC SDK to be present at build/run time; it does not work as a pure Python wheel on Linux without the SDK. The project is archived ...

0 kudos

Monday

1 More Replies

by HoussemBL • New Contributor III

05-16-2025 3:30:14 AM

2790 Views
2 replies
1 kudos

how to add Microsoft Entra ID managed service principal to aws databricks

Hi,I would like to add a Microsoft Entra ID managed service principal to AWS Databricks, but I have noticed that this option does not appear to be available-I am only able to create managed service principals directly within Databricks.For comparison...

Data Engineering

2790 Views
2 replies
1 kudos

05-16-2025 3:30:14 AM

View Replies

Latest Reply

stbjelcevic
Databricks Employee

Monday

1 kudos

You cannot add a Microsoft Entra ID–managed service principal to Databricks on AWS today; AWS workspaces only support Databricks‑managed service principals that you create in the Databricks account/workspace, not service principals federated from Ent...

1 kudos

Monday

1 More Replies

by nchittampelly • New Contributor II

05-05-2025 8:27:44 AM

3097 Views
3 replies
0 kudos

What is the best way to connect Oracle CRM cloud from databricks?

Data Engineering

3097 Views
3 replies
0 kudos

05-05-2025 8:27:44 AM

View Replies

Latest Reply

nchittampelly
New Contributor II

07-29-2025 8:45:50 AM

0 kudos

Oracle CRM on Demand is a Cloud platform not a relational database.Is there any proven solution for this requirement?

0 kudos

07-29-2025 8:45:50 AM

2 More Replies

by ManojkMohan • Honored Contributor II

Monday

89 Views
5 replies
4 kudos

Resolved! Accessing Databricks data in Salesforce via zero copy

I have uploaded clickstream data as shown belowDo i have to mandatorily share via Delta sharing for values to be exposed in Salesforce ?At the Salesforce end i have confirmed that i have a working connector where i am able to see samples data , but u...

Data Engineering

89 Views
5 replies
4 kudos

Monday

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

Monday

4 kudos

So, for instance I have catalog called project in databricks free edition. If I would like to assign proper permission for my Service Principal (so that it can see the tables wihtin catalog and can query them) first I need to set 2 preequisite permis...

4 kudos

Monday

4 More Replies

by prasadvaze • Valued Contributor II

05-11-2022 1:37:26 PM

9559 Views
5 replies
6 kudos

Resolved! Limit on number of result rows displayed on databricks SQL UI

Databricks SQL UI currently limits the query results display to 64000 rows. When will this limit go away? Using SSMS I get 40MM rows results in the UI and my users won't switch to databricks SQL for this reason

Data Engineering

9559 Views
5 replies
6 kudos

05-11-2022 1:37:26 PM

View Replies

Latest Reply

vsrmerc
New Contributor

Monday

6 kudos

want to understand the reason behind it. retrieving 500k records is not a problem, is it rendering over the http thats the problematic?

6 kudos

Monday

4 More Replies

by Nidhig • Contributor

Monday

59 Views
4 replies
3 kudos

Lakeflow jobs

Hi I am currently working on migrating all ADF jobs to LakeFlow jobs. I have a few questions:Pipeline cost: What is the cost model for running LakeFlow pipelines? Any documentation available? ADF vs Lakeflow Job?Job reuse: Do LakeFlow jobs reuse the...

Data Engineering

59 Views
4 replies
3 kudos

Monday

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

Monday

3 kudos

Hi @Nidhig ,1. Regarding pipeline cost - here you're mostly paying for compute usage. So the exact price depends on which plan you are at and which cloud provider you are using. For instance, for Azure premium plan and US East region you have followi...

3 kudos

Monday

3 More Replies

by RIDBX • New Contributor III

Sunday

51 Views
2 replies
1 kudos

How to make streaming files?

Thanks for reviewing my threads.I am trying to test streaming table /files in databricks FREE edition.-- Create test streaming tableCREATE OR REFRESH STREAMING TABLE user.demo.test_bronze_st ASSELECT * FROM STREAM read_files('/Volumes/xxx_ws/demo/raw...

Data Engineering

51 Views
2 replies
1 kudos

Sunday

View Replies

Latest Reply

RIDBX
New Contributor III

Monday

1 kudos

Thanks for weighing in. Are you saying CREATE OR REFRESH STREAMING TABLE user.demo.test_bronze_st cannot be used in FREE Edition?If we can use it, how do to create STREAM read_files('/Volumes/xxx_ws/demo/raw_files/test.csv'),where .csv sitting on lo...

1 kudos

Monday

1 More Replies

by William_Scardua • Valued Contributor

Monday

61 Views
1 replies
1 kudos

What the best Framework/Package for data quality

Hi everyone,I’m currently looking for a data-quality solution for my environment. I don’t have DTL tables or a Unity Catalog in place.In your opinion, what is the best framework or package to implement reliable data-quality checks under these conditi...

Data Engineering

61 Views
1 replies
1 kudos

Monday

View Replies

Latest Reply

nayan_wylde
Esteemed Contributor

Monday

1 kudos

Here are few DQ packages for DLT or LDP that you can try.1. Databricks Labs DQXPurpose-built for Spark and Databricks.Rule-based checks on DataFrames (batch & streaming).Supports quarantine and profiling.Lightweight and easy to integrate.2. Great Exp...

1 kudos

Monday

Databricks Community

Forum Posts

Can’t save results to target table – out-of-memory error

DLT x UC x Auto Loader

Migrating from directory-listing to Autoloader Managed File events

Error during build process for serving model caused by detectron2

Lakebridge analyzer not able to determine DDL.

Schema Registry certificate auth with Unity Catalog volumes.

Adding a message to azure service bus

How Install Pyrfc into AWS Databrick using Volumes

how to add Microsoft Entra ID managed service principal to aws databricks

What is the best way to connect Oracle CRM cloud from databricks?

Resolved! Accessing Databricks data in Salesforce via zero copy

Resolved! Limit on number of result rows displayed on databricks SQL UI

Lakeflow jobs

How to make streaming files?

What the best Framework/Package for data quality

Join Us as a Local Community Builder!

Moving tables between pipelines in production

Error while creating databricks custom app

Cannot view nested MLflow experiment runs without ...

Autoloader Managed File events

Unable to navigate/login to Databricks Account Con...