cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

smedegaard
by New Contributor III
  • 702 Views
  • 3 replies
  • 0 kudos

DLT run filas with "com.databricks.cdc.spark.DebeziumJDBCMicroBatchProvider not found"

I've created a streaming live table from a foreign catalog. When I run the DLT pipeline it fils with "com.databricks.cdc.spark.DebeziumJDBCMicroBatchProvider not found".I haven't seen any documentation that suggests I need to install Debezium manuall...

  • 702 Views
  • 3 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @smedegaard, The error message you’re encountering, “com.databricks.cdc.spark.DebeziumJDBCMicroBatchProvider not found,” indicates that the specified class is not available in your classpath.   To address this issue, follow these steps: Verif...

  • 0 kudos
2 More Replies
Chengzhu
by New Contributor
  • 148 Views
  • 1 replies
  • 0 kudos

Databricks Model Registry Notification

Hi community,Currently, I am training models on databricks cluster and use mlflow to log and register models. My goal is to send notification to me when a new version of registered model happens (if the new run achieves some model performance baselin...

Screenshot 2024-04-17 at 1.14.11 PM.png Screenshot 2024-04-17 at 1.13.14 PM.png
  • 148 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Chengzhu, It seems like you’re using MLflow’s Model Registry to manage the lifecycle of your machine learning models. Let’s explore this further. The MLflow Model Registry provides a centralized model store, APIs, and a UI to collaboratively m...

  • 0 kudos
EWhitley
by New Contributor III
  • 332 Views
  • 1 replies
  • 0 kudos

Custom ENUM input as parameter for SQL UDF?

Hello  - We're migrating from T-SQL to Spark SQL. We're migrating a significant number of queries."datediff(unit, start,end)" is different between these two implementations (in a good way).  For the purpose of migration, we'd like to stay as consiste...

  • 332 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @EWhitley, You’re on the right track with creating a custom UDF in Python for your migration. To achieve similar behaviour to the T-SQL DATEDIFF function with an enum-like unit parameter, you can follow these steps: Create a Custom UDF: Define...

  • 0 kudos
YannLevavasseur
by New Contributor
  • 407 Views
  • 1 replies
  • 0 kudos

SQL function refactoring into Databricks environment

Hello all,I'm currently working on importing  some SQL functions from Informix Database into Databricks using Asset Bundle deploying Delta Live Table to Unity Catalog. I'm struggling importing a recursive one, there is the code :CREATE FUNCTION "info...

YannLevavasseur_0-1713952085696.png YannLevavasseur_1-1713952236903.png
  • 407 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @YannLevavasseur, It looks like you’re dealing with a recursive SQL function for calculating the weight of articles in a Databricks environment. Handling recursion in SQL can be tricky, especially when translating existing Informix code to Data...

  • 0 kudos
Sambit_S
by New Contributor II
  • 315 Views
  • 1 replies
  • 0 kudos

Error during deserializing protobuf data

I am receiving protobuf data in a json attribute and along with it I receive a descriptor file.I am using from_protobuf to deserialize the data as below,It works most of the time but giving error when there are some recursive fields within the protob...

Sambit_S_0-1713966940987.png
  • 315 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Sambit_S, Handling recursive fields in Protobuf can indeed be tricky, especially when deserializing data. Let’s explore some potential solutions to address this issue: Casting Issue with Recursive Fields: The error you’re encountering might b...

  • 0 kudos
Skr7
by New Contributor II
  • 327 Views
  • 1 replies
  • 0 kudos

Databricks Asset Bundles

Hi, I'm implementing Databricks Asset bundles, my scripts are in GitHub and my /resource has all the .yml of my Databricks workflow which are pointing to the main branch      git_source: git_url: https://github.com/xxxx git_provider: ...

Data Engineering
Databricks
  • 327 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Skr7 , Let’s break down your requirements: Dynamically Changing Git Branch for Databricks Asset Bundles (DABs): When deploying and running your DAB, you want the Databricks workflows to point to your feature branch instead of the main branch....

  • 0 kudos
dbdude
by New Contributor II
  • 5037 Views
  • 7 replies
  • 0 kudos

AWS Secrets Works In One Cluster But Not Another

Why can I use boto3 to go to secrets manager to retrieve a secret with a personal cluster but I get an error with a shared cluster?NoCredentialsError: Unable to locate credentials 

  • 5037 Views
  • 7 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @dbdude and @drii_cavalcanti , The NoCredentialsError you’re encountering when using Boto3 to retrieve a secret from AWS Secrets Manager typically indicates that the AWS SDK is unable to find valid credentials for your API request. Let’s explor...

  • 0 kudos
6 More Replies
Skr7
by New Contributor II
  • 1311 Views
  • 2 replies
  • 1 kudos

Resolved! Scheduled job output export

Hi ,I have a Databricks job that results in a dashboard post run , I'm able to download the dashboard as HTML from the view job runs page , but I want to automate the process , so I tried using the Databricks API , but it says {"error_code":"INVALID_...

Data Engineering
data engineering
  • 1311 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Hi @Skr7, You cannot automate exporting the dashboard as HTML using the Databricks API. The Databricks API only supports exporting results for notebook task runs, not for job run dashboards.  Here's the relevant excerpt from the provided sources: Exp...

  • 1 kudos
1 More Replies
niruban
by New Contributor II
  • 145 Views
  • 1 replies
  • 0 kudos

Migrate a notebook that reside in workspace using Databricks Asset Bundle

Hello Community Folks -Did anyone implemented migration of notebooks that is in workspace to production databricks workspace using Databricks Asset Bundle? If so can you please help me with any documentation which I can refer? Thanks!!RegardsNiruban ...

  • 145 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @niruban, Migrating notebooks from one Databricks workspace to another using Databricks Asset Bundles is a useful approach. Let me guide you through the process and provide relevant documentation. Databricks Asset Bundles Overview: Databricks ...

  • 0 kudos
Oliver_Angelil
by Valued Contributor II
  • 191 Views
  • 1 replies
  • 0 kudos

Append-only table from non-streaming source in Delta Live Tables

I have a DLT pipeline, where all tables are non-streaming (materialized views), except for the last one, which needs to be append-only, and is therefore defined as a streaming table.The pipeline runs successfully on the first run. However on the seco...

  • 191 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Oliver_Angelil, It appears that you’re encountering an issue with your DLT (Databricks Delta Live Tables) pipeline, specifically related to having an append-only table at the end of the pipeline. Let’s explore some potential solutions: Stream...

  • 0 kudos
BerkerKozan
by New Contributor III
  • 114 Views
  • 1 replies
  • 0 kudos

Using AAD Spn on AWS Databricks

I use AWS Databricks which has an SSO&Scim integration with AAD. I generated an SPN in AAD, synced it to Databricks, and want to use this SPN with using AAD client secrets to use Databricks SDK. But it doesnt work. I dont want to generate another tok...

  • 114 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @BerkerKozan, It sounds like you’re trying to set up provisioning to Databricks using Microsoft Entra ID (formerly known as Azure Active Directory) and encountering some issues. Let’s break down the steps and address your concerns: Provisionin...

  • 0 kudos
sasi2
by New Contributor II
  • 337 Views
  • 1 replies
  • 0 kudos

Connecting to MuleSoft from Databricks

Hi, Is there any connectivity pipeline established already to access MuleSoft or AnyPoint exchange data using Databricks. I have seen many options to access databricks data in mulesoft but can we read the data from Mulesoft into databricks. Please gi...

  • 337 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

  Hi @sasi2, Connecting MuleSoft or AnyPoint to exchange data with Databricks is possible, and there are several options you can explore. Let’s dive into some solutions: Using JDBC Driver for Databricks in Mule Applications: The CData JDBC Driver...

  • 0 kudos
MartinH
by New Contributor II
  • 2789 Views
  • 7 replies
  • 5 kudos

Resolved! Azure Data Factory and Photon

Hello, we have Databricks Python workbooks accessing Delta tables. These workbooks are scheduled/invoked by Azure Data Factory. How can I enable Photon on the linked services that are used to call Databricks?If I specify new job cluster, there does n...

  • 2789 Views
  • 7 replies
  • 5 kudos
Latest Reply
CharlesReily
New Contributor III
  • 5 kudos

When you create a cluster on Databricks, you can enable Photon by selecting the "Photon" option in the cluster configuration settings. This is typically done when creating a new cluster, and you would find the option in the advanced cluster configura...

  • 5 kudos
6 More Replies
subha2
by New Contributor II
  • 484 Views
  • 1 replies
  • 0 kudos

Not able to read tables in Unity Catalog parallel

There are some tables under schema/database under Unity Catalog.The Notebook need to read the table parallel using loop and thread and execute the query configuredBut the sql statement is not getting executed via spark.sql() or spark.read.table().It ...

  • 484 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @subha2, It seems you’re encountering an issue related to executing SQL statements in Spark. Let’s troubleshoot this step by step: Check the Unity Catalog Configuration: Verify that the Unity Catalog configuration is correctly set up. Ensure t...

  • 0 kudos
DBX-2024
by New Contributor
  • 308 Views
  • 1 replies
  • 0 kudos

Job Cluster's CPU utilization goes higher than 100% few times during the workload run

I have Data Engineering Pipeline workload that run on Databricks.Job cluster has following configuration :- Worker  i3.4xlarge with 122 GB memory and 16 coresDriver i3.4xlarge with 122 GB memory and 16 cores ,Min Worker -4 and Max Worker 8 We noticed...

Data Engineering
Databricks
  • 308 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @DBX-2024, Let’s break down your questions: High CPU Utilization Spikes: Are They Problematic? High CPU utilization spikes can be problematic depending on the context. Here are some considerations: Normal Behavior: It’s common for CPU utilizat...

  • 0 kudos
Labels