cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

EDDatabricks
by Contributor
  • 4102 Views
  • 1 replies
  • 0 kudos

Schema Registry certificate auth with Unity Catalog volumes.

Greetings.We currently have a Spark structured streaming job (Scala) retrieving avro data from an Azure Eventhub with a confluent schema registry endpoint (using an Azure Api Management gateway with certificate authentication).Until now the .jks file...

success.png success2.png error1.png e2.png
  • 4102 Views
  • 1 replies
  • 0 kudos
Latest Reply
stbjelcevic
Databricks Employee
  • 0 kudos

Thanks for the detailed context—here’s a concise, actionable troubleshooting plan tailored to Databricks with Unity Catalog volumes and Avro + Confluent Schema Registry over APIM with mTLS. What’s likely going wrong Based on your description, the ini...

  • 0 kudos
Sega2
by New Contributor III
  • 4662 Views
  • 2 replies
  • 0 kudos

Adding a message to azure service bus

I am trying to send a message to a service bus in azure. But I get following error:ServiceBusError: Handler failed: DefaultAzureCredential failed to retrieve a token from the included credentials.This is the line that fails: credential = DefaultAzure...

  • 4662 Views
  • 2 replies
  • 0 kudos
Latest Reply
stbjelcevic
Databricks Employee
  • 0 kudos

It looks like the issue is with the Azure credential chain rather than Service Bus itself; in Databricks notebooks, DefaultAzureCredential won’t succeed unless there’s a valid identity available (env vars, CLI login, managed identity, or a Databricks...

  • 0 kudos
1 More Replies
mplang
by New Contributor
  • 4051 Views
  • 2 replies
  • 1 kudos

DLT x UC x Auto Loader

Now that the Directory Listing Mode of Auto Loader is officially deprecated, is there a solution for using File Notification Mode in a DLT pipeline writing to a UC-managed table? My understanding is that File Notification Mode is only available on si...

Data Engineering
autoloader
dlt
UC
  • 4051 Views
  • 2 replies
  • 1 kudos
Latest Reply
stbjelcevic
Databricks Employee
  • 1 kudos

Yes: use Auto Loader’s UC-managed file events in DLT with DBR 14.3 LTS or later; enable file events on the UC external location and set managed file events in the stream. This avoids the legacy file notifications’ single-user compute limitation and w...

  • 1 kudos
1 More Replies
Miguel_Salas
by New Contributor II
  • 4936 Views
  • 2 replies
  • 0 kudos

How Install Pyrfc into AWS Databrick using Volumes

I'm trying to install Pyrfc in a Databricks Cluster (already tried in r5.xlarge, m5.xlarge, and c6gd.xlarge). I'm following these link.https://community.databricks.com/t5/data-engineering/how-can-i-cluster-install-a-c-python-library-pyrfc/td-p/8118Bu...

  • 4936 Views
  • 2 replies
  • 0 kudos
Latest Reply
stbjelcevic
Databricks Employee
  • 0 kudos

Thanks for the details. The PyRFC package is a Python binding around the SAP NetWeaver RFC SDK and requires the SAP NW RFC SDK to be present at build/run time; it does not work as a pure Python wheel on Linux without the SDK. The project is archived ...

  • 0 kudos
1 More Replies
HoussemBL
by New Contributor III
  • 2772 Views
  • 2 replies
  • 1 kudos

how to add Microsoft Entra ID managed service principal to aws databricks

Hi,I would like to add a Microsoft Entra ID managed service principal to AWS Databricks, but I have noticed that this option does not appear to be available-I am only able to create managed service principals directly within Databricks.For comparison...

Screenshot 2025-05-16 at 12.25.26.png Screenshot 2025-05-16 at 12.25.46.png
  • 2772 Views
  • 2 replies
  • 1 kudos
Latest Reply
stbjelcevic
Databricks Employee
  • 1 kudos

You cannot add a Microsoft Entra ID–managed service principal to Databricks on AWS today; AWS workspaces only support Databricks‑managed service principals that you create in the Databricks account/workspace, not service principals federated from Ent...

  • 1 kudos
1 More Replies
nchittampelly
by New Contributor II
  • 3085 Views
  • 3 replies
  • 0 kudos

What is the best way to connect Oracle CRM cloud from databricks?

What is the best way to connect Oracle CRM cloud from databricks?

  • 3085 Views
  • 3 replies
  • 0 kudos
Latest Reply
nchittampelly
New Contributor II
  • 0 kudos

Oracle CRM on Demand is a Cloud platform not a relational database.Is there any proven solution for this requirement?

  • 0 kudos
2 More Replies
ManojkMohan
by Honored Contributor II
  • 43 Views
  • 5 replies
  • 3 kudos

Resolved! Accessing Databricks data in Salesforce via zero copy

I have uploaded clickstream data as shown belowDo i have to mandatorily share via Delta sharing for values to be exposed in Salesforce ?At the Salesforce end i have confirmed that i have a working connector where i am able to see samples data , but u...

ManojkMohan_0-1763394266370.png ManojkMohan_1-1763394302680.png ManojkMohan_3-1763394433335.png ManojkMohan_4-1763394570789.png
  • 43 Views
  • 5 replies
  • 3 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 3 kudos

So, for instance I have catalog called project in databricks free edition. If I would like to assign proper permission for my Service Principal (so that it can see the tables wihtin catalog and can query them) first I need to set 2 preequisite permis...

  • 3 kudos
4 More Replies
prasadvaze
by Valued Contributor II
  • 9542 Views
  • 5 replies
  • 6 kudos

Resolved! Limit on number of result rows displayed on databricks SQL UI

Databricks SQL UI currently limits the query results display to 64000 rows. When will this limit go away? Using SSMS I get 40MM rows results in the UI and my users won't switch to databricks SQL for this reason

  • 9542 Views
  • 5 replies
  • 6 kudos
Latest Reply
vsrmerc
Visitor
  • 6 kudos

want to understand the reason behind it. retrieving 500k records is not a problem, is it rendering over the http thats the problematic? 

  • 6 kudos
4 More Replies
Nidhig
by Contributor
  • 23 Views
  • 4 replies
  • 2 kudos

Lakeflow jobs

 Hi I am currently working on migrating all ADF jobs to LakeFlow jobs. I have a few questions:Pipeline cost: What is the cost model for running LakeFlow pipelines? Any documentation available? ADF vs Lakeflow Job?Job reuse: Do LakeFlow jobs reuse the...

  • 23 Views
  • 4 replies
  • 2 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 2 kudos

Hi @Nidhig ,1. Regarding pipeline cost - here you're mostly paying for compute usage. So the exact price depends on which plan you are at and which cloud provider you are using. For instance, for Azure premium plan and US East region you have followi...

  • 2 kudos
3 More Replies
RIDBX
by New Contributor III
  • 34 Views
  • 2 replies
  • 1 kudos

How to make streaming files?

Thanks for reviewing my threads.I am trying to test streaming table /files in databricks FREE edition.-- Create test streaming tableCREATE OR REFRESH STREAMING TABLE user.demo.test_bronze_st ASSELECT * FROM STREAM read_files('/Volumes/xxx_ws/demo/raw...

  • 34 Views
  • 2 replies
  • 1 kudos
Latest Reply
RIDBX
New Contributor III
  • 1 kudos

Thanks for weighing in. Are you saying CREATE OR REFRESH STREAMING TABLE user.demo.test_bronze_st  cannot be used in FREE Edition?If we can use it, how do to create STREAM read_files('/Volumes/xxx_ws/demo/raw_files/test.csv'),where .csv sitting on lo...

  • 1 kudos
1 More Replies
Techtic_kush
by New Contributor
  • 25 Views
  • 1 replies
  • 0 kudos

Can’t save results to target table – out-of-memory error

Hi team, I’m processing ~5,000 EMR notes with a Databricks notebook. The job reads from `crc_lakehouse.bronze.emr_notes`, runs SciSpaCy UMLS entity extraction plus a fine-tuned BERT sentiment model per partition, and builds a DataFrame (`df_entities`...

  • 25 Views
  • 1 replies
  • 0 kudos
Latest Reply
bianca_unifeye
New Contributor III
  • 0 kudos

You’re right that the behaviour is weird at first glance (“5k rows on a 64 GB cluster and I blow up on write”), but your stack trace is actually very revealing: this isn’t a classic Delta write / shuffle OOM – it’s SciSpaCy/UMLS falling over when loa...

  • 0 kudos
William_Scardua
by Valued Contributor
  • 28 Views
  • 1 replies
  • 1 kudos

What the best Framework/Package for data quality

Hi everyone,I’m currently looking for a data-quality solution for my environment. I don’t have DTL tables or a Unity Catalog in place.In your opinion, what is the best framework or package to implement reliable data-quality checks under these conditi...

  • 28 Views
  • 1 replies
  • 1 kudos
Latest Reply
nayan_wylde
Esteemed Contributor
  • 1 kudos

Here are few DQ packages for DLT or LDP that you can try.1. Databricks Labs DQXPurpose-built for Spark and Databricks.Rule-based checks on DataFrames (batch & streaming).Supports quarantine and profiling.Lightweight and easy to integrate.2. Great Exp...

  • 1 kudos
ShivMukesh
by New Contributor
  • 3680 Views
  • 3 replies
  • 0 kudos

Upgrade HSM to UC using Ucx tool - workspace to workspace migration

Hello team,I understand that an automatic upgrade to UC utilizing the UCx tool (Databricks Lab project) is now available to complete this migration from HSM to UC in automate way. But does this tool allow workspace to workspace catalog/artifact migra...

  • 3680 Views
  • 3 replies
  • 0 kudos
Latest Reply
nayan_wylde
Esteemed Contributor
  • 0 kudos

@ShivMukesh I have used UCX to migrate to Unity catalog. It is a great tool. But it also involves lot of workarounds specially in group migration and table migration. In group migration it renames the old workspace group and assigns the same permissi...

  • 0 kudos
2 More Replies
Pratikmsbsvm
by Contributor
  • 47 Views
  • 3 replies
  • 1 kudos

Data Pipeline for Bringing Data from Oracle Fusion to Azure Databricks

I am trying to bring Oracle Fusion (SCM, HCM, Finance) Data and push to ADLS Gen2. Databricks used for Data Transformation and Powerbi used for Reports Visualization.I have 3 Option.Option 1 :Option 2 : Option 3May someone please help me which is bes...

Option1.png Option2.png Option3.png
  • 47 Views
  • 3 replies
  • 1 kudos
Latest Reply
Raman_Unifeye
Contributor
  • 1 kudos

Option-1 using Oracle's Bulk extraction utility BICC. It can directly export the extracted data files (typically CSV) to Oracle cloud storage destination and then you could use ADF to get it copied over to ADLS.

  • 1 kudos
2 More Replies
Naveenkumar1811
by New Contributor II
  • 24 Views
  • 2 replies
  • 0 kudos

SkipChangeCommit to True Scenario on Data Loss Possibility

Hi Team,I have Below Scenario,I have a Spark Streaming Job with trigger of Processing time as 3 secs Running Continuously 365 days.We are performing a weekly delete job from the source of this streaming job based on custom retention policy. it is a D...

  • 24 Views
  • 2 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

It shouldn't. You have append only stream and SkipChangeCommit will ignore any modification that were applied to already existing files 

  • 0 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels