cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Pratikmsbsvm
by Contributor
  • 87 Views
  • 2 replies
  • 1 kudos

How to Design a Data Quality Framework for Medallion Architecture Data Pipeline

Hello,I am building a Data Pipeline which extract data from Oracle Fusion and Push it to Databricks Delta lake.I am using Bronze, Silver and Gold Approach.May someone please help me how to control all three segment that is Bronze, Silver and Gold wit...

  • 87 Views
  • 2 replies
  • 1 kudos
Latest Reply
nayan_wylde
Esteemed Contributor
  • 1 kudos

Here’s how you can implement DQ at each stage:Bronze LayerChecks:File format validation (CSV, JSON, etc.).Schema validation (column names, types).Row count vs. source system.Tools:Use Databricks Autoloader with schema evolution and badRecordsPathImpl...

  • 1 kudos
1 More Replies
Shalabh007
by Honored Contributor
  • 8912 Views
  • 6 replies
  • 19 kudos

Practice Exams for Databricks Certified Data Engineer Professional exam

Can anyone help with official Practice Exams set for Databricks Certified Data Engineer Professional exam, like we have below for Databricks Certified Data Engineer AssociatePractice exam for the Databricks Certified Data Engineer Associate exam

  • 8912 Views
  • 6 replies
  • 19 kudos
Latest Reply
JOHNBOSCOW23
New Contributor
  • 19 kudos

I Passed my Exam today thanks

  • 19 kudos
5 More Replies
Andolina1
by New Contributor III
  • 2923 Views
  • 6 replies
  • 1 kudos

How to trigger an Azure Data Factory pipeline through API using parameters

Hello All,I have a use case where I want to trigger an Azure Data Factory pipeline through API. Right now I am calling the API in Databricks and using Service Principal(token based) to connect to ADF from Databricks.The ADF pipeline has some paramete...

  • 2923 Views
  • 6 replies
  • 1 kudos
Latest Reply
rfranco
New Contributor
  • 1 kudos

Hello @Andolina1,try to send your payload like:body = {'curr_working_user': f'{parameters}'}response = requests.post(url, headers=headers, json=body)the pipeline's parameter should be named curr_working_user. With these changes your setup should work...

  • 1 kudos
5 More Replies
BipinDatabricks
by New Contributor
  • 60 Views
  • 3 replies
  • 0 kudos

Using Databricks Sql Statement Execution api

TeamWe have internal chatbot service that will send query to data bricks SQL execution API.Number of queries vary from 50 per minute to 100 per minutes. and we are trying to limit response size by applying limit 10. Basically trying hard to use all o...

  • 60 Views
  • 3 replies
  • 0 kudos
Latest Reply
Coffee77
Contributor III
  • 0 kudos

As a suggestion, you can also think of creating your own API to query directly your tables via JDBC/ODBC connections over a SQL Warehouse. This case, limitations would be only those associated to SQL Warehouses and your API but not the Databricks API...

  • 0 kudos
2 More Replies
ShanQiwei
by New Contributor
  • 68 Views
  • 2 replies
  • 0 kudos

I/F security about using medallion architecture

I’m new to writing requirement definitions, and I’d like to ask a question about interface (I/F) security.My question is:Do I need to define the authentication and security mechanisms (such as OAuth2, Managed Identity, Service Principals, etc.) betwe...

  • 68 Views
  • 2 replies
  • 0 kudos
Latest Reply
Coffee77
Contributor III
  • 0 kudos

I'll try to summarize and go directly to the key points as I see this:- Client to S3  SAS Token or OAUTH 2.0 with Service to Service authentication (preferred)- Databricks to S3  Use Service Principal or Managed Identities (preferred)- Bronze/Silver/...

  • 0 kudos
1 More Replies
Techtic_kush
by New Contributor
  • 93 Views
  • 2 replies
  • 2 kudos

Can’t save results to target table – out-of-memory error

Hi team, I’m processing ~5,000 EMR notes with a Databricks notebook. The job reads from `crc_lakehouse.bronze.emr_notes`, runs SciSpaCy UMLS entity extraction plus a fine-tuned BERT sentiment model per partition, and builds a DataFrame (`df_entities`...

  • 93 Views
  • 2 replies
  • 2 kudos
Latest Reply
bianca_unifeye
New Contributor III
  • 2 kudos

You’re right that the behaviour is weird at first glance (“5k rows on a 64 GB cluster and I blow up on write”), but your stack trace is actually very revealing: this isn’t a classic Delta write / shuffle OOM – it’s SciSpaCy/UMLS falling over when loa...

  • 2 kudos
1 More Replies
mplang
by New Contributor
  • 4090 Views
  • 3 replies
  • 2 kudos

DLT x UC x Auto Loader

Now that the Directory Listing Mode of Auto Loader is officially deprecated, is there a solution for using File Notification Mode in a DLT pipeline writing to a UC-managed table? My understanding is that File Notification Mode is only available on si...

Data Engineering
autoloader
dlt
UC
  • 4090 Views
  • 3 replies
  • 2 kudos
Latest Reply
Raman_Unifeye
Contributor III
  • 2 kudos

Databricks introduced Managed File Events which completely bypasses the need for the cluster's identity to provision cloud resources, resolving the conflict with the Shared cluster mode.Steps to Implement in DLTEnable File Events on the External Loca...

  • 2 kudos
2 More Replies
Sainath368
by Contributor
  • 81 Views
  • 3 replies
  • 2 kudos

Migrating from directory-listing to Autoloader Managed File events

Hi everyone,We are currently migrating from a directory listing-based streaming approach to managed file events in Databricks Auto Loader for processing our data in structured streaming.We have a function that handles structured streaming where we ar...

  • 81 Views
  • 3 replies
  • 2 kudos
Latest Reply
Raman_Unifeye
Contributor III
  • 2 kudos

Yes, for your setup, Databricks Auto Loader will create a separate event queue for each independent stream running with the cloudFiles.useManagedFileEvents = true option.As you are running - 1 stream per table, 1 unique directory per stream and 1 uni...

  • 2 kudos
2 More Replies
StephenDsouza
by New Contributor II
  • 3101 Views
  • 3 replies
  • 0 kudos

Error during build process for serving model caused by detectron2

Hi All,Introduction: I am trying to register my model on Databricks so that I can serve it as an endpoint. The packages that I need are "torch", "mlflow", "torchvision", "numpy" and "git+https://github.com/facebookresearch/detectron2.git". For this, ...

  • 3101 Views
  • 3 replies
  • 0 kudos
Latest Reply
StephenDsouza
New Contributor II
  • 0 kudos

Found an answer!Basically pip was somehow installed the dependencies from the git repo first and was not following the given order so in order to solve this, I added the libraries for conda to install.``` conda_env = { "channels": [ "defa...

  • 0 kudos
2 More Replies
shashankB
by New Contributor III
  • 145 Views
  • 5 replies
  • 0 kudos

Lakebridge analyzer not able to determine DDL.

 Databricks analyzer does not shows any DDL statement count, I've also tested with just a simple SELECT * query (SELECT *  FROM SCHEMA_NAME.TABLE_NAME;) . Is there any solution for this ?My target was to get a detailed analysis on SnowSQL code. Any h...

  • 145 Views
  • 5 replies
  • 0 kudos
Latest Reply
Thompson2345
New Contributor II
  • 0 kudos

The Lakebridge analyzer counts DDL statements, not regular queries. A simple SELECT * is DML, not DDL, so it won’t show up in the DDL count.To get meaningful results for SnowSQL code analysis:Include actual DDL statements like CREATE TABLE, ALTER TAB...

  • 0 kudos
4 More Replies
EDDatabricks
by Contributor
  • 4122 Views
  • 1 replies
  • 1 kudos

Schema Registry certificate auth with Unity Catalog volumes.

Greetings.We currently have a Spark structured streaming job (Scala) retrieving avro data from an Azure Eventhub with a confluent schema registry endpoint (using an Azure Api Management gateway with certificate authentication).Until now the .jks file...

success.png success2.png error1.png e2.png
  • 4122 Views
  • 1 replies
  • 1 kudos
Latest Reply
stbjelcevic
Databricks Employee
  • 1 kudos

Thanks for the detailed context—here’s a concise, actionable troubleshooting plan tailored to Databricks with Unity Catalog volumes and Avro + Confluent Schema Registry over APIM with mTLS. What’s likely going wrong Based on your description, the ini...

  • 1 kudos
Sega2
by New Contributor III
  • 4680 Views
  • 2 replies
  • 0 kudos

Adding a message to azure service bus

I am trying to send a message to a service bus in azure. But I get following error:ServiceBusError: Handler failed: DefaultAzureCredential failed to retrieve a token from the included credentials.This is the line that fails: credential = DefaultAzure...

  • 4680 Views
  • 2 replies
  • 0 kudos
Latest Reply
stbjelcevic
Databricks Employee
  • 0 kudos

It looks like the issue is with the Azure credential chain rather than Service Bus itself; in Databricks notebooks, DefaultAzureCredential won’t succeed unless there’s a valid identity available (env vars, CLI login, managed identity, or a Databricks...

  • 0 kudos
1 More Replies
Miguel_Salas
by New Contributor II
  • 4956 Views
  • 2 replies
  • 0 kudos

How Install Pyrfc into AWS Databrick using Volumes

I'm trying to install Pyrfc in a Databricks Cluster (already tried in r5.xlarge, m5.xlarge, and c6gd.xlarge). I'm following these link.https://community.databricks.com/t5/data-engineering/how-can-i-cluster-install-a-c-python-library-pyrfc/td-p/8118Bu...

  • 4956 Views
  • 2 replies
  • 0 kudos
Latest Reply
stbjelcevic
Databricks Employee
  • 0 kudos

Thanks for the details. The PyRFC package is a Python binding around the SAP NetWeaver RFC SDK and requires the SAP NW RFC SDK to be present at build/run time; it does not work as a pure Python wheel on Linux without the SDK. The project is archived ...

  • 0 kudos
1 More Replies
HoussemBL
by New Contributor III
  • 2790 Views
  • 2 replies
  • 1 kudos

how to add Microsoft Entra ID managed service principal to aws databricks

Hi,I would like to add a Microsoft Entra ID managed service principal to AWS Databricks, but I have noticed that this option does not appear to be available-I am only able to create managed service principals directly within Databricks.For comparison...

Screenshot 2025-05-16 at 12.25.26.png Screenshot 2025-05-16 at 12.25.46.png
  • 2790 Views
  • 2 replies
  • 1 kudos
Latest Reply
stbjelcevic
Databricks Employee
  • 1 kudos

You cannot add a Microsoft Entra ID–managed service principal to Databricks on AWS today; AWS workspaces only support Databricks‑managed service principals that you create in the Databricks account/workspace, not service principals federated from Ent...

  • 1 kudos
1 More Replies
nchittampelly
by New Contributor II
  • 3097 Views
  • 3 replies
  • 0 kudos

What is the best way to connect Oracle CRM cloud from databricks?

What is the best way to connect Oracle CRM cloud from databricks?

  • 3097 Views
  • 3 replies
  • 0 kudos
Latest Reply
nchittampelly
New Contributor II
  • 0 kudos

Oracle CRM on Demand is a Cloud platform not a relational database.Is there any proven solution for this requirement?

  • 0 kudos
2 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels