Hello, we're working with a serverless SQL cluster to query Delta tables and display some analytics in dashboards. We have some basic group by queries that generate around 36k lines, and they are executed without the "limit" key word. So in the data ...
Hey @RabahO
This is likely a memory issue.
The current behavior is that Databricks will only attempt to display the first 64000 rows of data. If the first 64000 rows of data are larger than 2187 MB, then it will fail to display anything. In your cas...
I am receiving protobuf data in a json attribute and along with it I receive a descriptor file.I am using from_protobuf to deserialize the data as below,It works most of the time but giving error when there are some recursive fields within the protob...
Hi @Sambit_S, Handling recursive fields in Protobuf can indeed be tricky, especially when deserializing data.
Let’s explore some potential solutions to address this issue:
Casting Issue with Recursive Fields: The error you’re encountering might b...
Hi, I'm implementing Databricks Asset bundles, my scripts are in GitHub and my /resource has all the .yml of my Databricks workflow which are pointing to the main branch git_source:
git_url: https://github.com/xxxx
git_provider: ...
Hi @Skr7 , Let’s break down your requirements:
Dynamically Changing Git Branch for Databricks Asset Bundles (DABs): When deploying and running your DAB, you want the Databricks workflows to point to your feature branch instead of the main branch....
Hi,I have configured 20 different workflows in Databricks. All of them configured with job cluster with different name. All 20 workfldows scheduled to run at same time. But even configuring different job cluster in all of them they run sequentially w...
HI @jainshasha i tried to replicate your problem but in my case i was able to run jobs in parallel(the only difference is that i am running notebook from workspace, not from repo)As you can see jobs did not started exactly same time but it run in par...
Hello, I am trying to connect the power bi semantic model output (basically the data that has already been pre processed) to databricks. Does anybody know how to do this? I would like it to be an automated process so I would like to know any way to p...
Hi @madhumitha, Connecting Power BI semantic model output to Databricks can be done in a few steps.
Here are a couple of options:
Databricks Power Query Connector:
The new Databricks connector is natively integrated into Power BI. You can configu...
Why can I use boto3 to go to secrets manager to retrieve a secret with a personal cluster but I get an error with a shared cluster?NoCredentialsError: Unable to locate credentials
Hi @dbdude and @drii_cavalcanti , The NoCredentialsError you’re encountering when using Boto3 to retrieve a secret from AWS Secrets Manager typically indicates that the AWS SDK is unable to find valid credentials for your API request.
Let’s explor...
Hi ,I have a Databricks job that results in a dashboard post run , I'm able to download the dashboard as HTML from the view job runs page , but I want to automate the process , so I tried using the Databricks API , but it says {"error_code":"INVALID_...
Hi @Skr7, You cannot automate exporting the dashboard as HTML using the Databricks API. The Databricks API only supports exporting results for notebook task runs, not for job run dashboards.
Here's the relevant excerpt from the provided sources:
Exp...
Hi,I have a DLT pipeline that applies changes from a source table (cdctest_cdc_enriched) to a target table (cdctest), by the following code:dlt.apply_changes( target = "cdctest", source = "cdctest_cdc_enriched", keys = ["ID"], sequence_by...
Hi @Anske, It seems you’re encountering an issue with your Delta Live Tables (DLT) pipeline where updates from the source table are not being correctly applied to the target table.
Let’s troubleshoot this together!
Pipeline Update Process: Whe...
Hello Community Folks -Did anyone implemented migration of notebooks that is in workspace to production databricks workspace using Databricks Asset Bundle? If so can you please help me with any documentation which I can refer? Thanks!!RegardsNiruban ...
Hi @niruban, Migrating notebooks from one Databricks workspace to another using Databricks Asset Bundles is a useful approach.
Let me guide you through the process and provide relevant documentation.
Databricks Asset Bundles Overview:
Databricks ...
I have a DLT pipeline, where all tables are non-streaming (materialized views), except for the last one, which needs to be append-only, and is therefore defined as a streaming table.The pipeline runs successfully on the first run. However on the seco...
Hi @Oliver_Angelil, It appears that you’re encountering an issue with your DLT (Databricks Delta Live Tables) pipeline, specifically related to having an append-only table at the end of the pipeline.
Let’s explore some potential solutions:
Stream...
I use AWS Databricks which has an SSO&Scim integration with AAD. I generated an SPN in AAD, synced it to Databricks, and want to use this SPN with using AAD client secrets to use Databricks SDK. But it doesnt work. I dont want to generate another tok...
Hi @BerkerKozan, It sounds like you’re trying to set up provisioning to Databricks using Microsoft Entra ID (formerly known as Azure Active Directory) and encountering some issues.
Let’s break down the steps and address your concerns:
Provisionin...
Hi, Is there any connectivity pipeline established already to access MuleSoft or AnyPoint exchange data using Databricks. I have seen many options to access databricks data in mulesoft but can we read the data from Mulesoft into databricks. Please gi...
Hi @sasi2, Connecting MuleSoft or AnyPoint to exchange data with Databricks is possible, and there are several options you can explore.
Let’s dive into some solutions:
Using JDBC Driver for Databricks in Mule Applications:
The CData JDBC Driver...
Hello, we have Databricks Python workbooks accessing Delta tables. These workbooks are scheduled/invoked by Azure Data Factory. How can I enable Photon on the linked services that are used to call Databricks?If I specify new job cluster, there does n...
When you create a cluster on Databricks, you can enable Photon by selecting the "Photon" option in the cluster configuration settings. This is typically done when creating a new cluster, and you would find the option in the advanced cluster configura...
There are some tables under schema/database under Unity Catalog.The Notebook need to read the table parallel using loop and thread and execute the query configuredBut the sql statement is not getting executed via spark.sql() or spark.read.table().It ...
Hi @subha2, It seems you’re encountering an issue related to executing SQL statements in Spark.
Let’s troubleshoot this step by step:
Check the Unity Catalog Configuration:
Verify that the Unity Catalog configuration is correctly set up. Ensure t...
I have Data Engineering Pipeline workload that run on Databricks.Job cluster has following configuration :- Worker i3.4xlarge with 122 GB memory and 16 coresDriver i3.4xlarge with 122 GB memory and 16 cores ,Min Worker -4 and Max Worker 8 We noticed...
Hi @DBX-2024,
Let’s break down your questions:
High CPU Utilization Spikes: Are They Problematic?
High CPU utilization spikes can be problematic depending on the context. Here are some considerations:
Normal Behavior: It’s common for CPU utilizat...