cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

boitumelodikoko
by Contributor II
  • 4412 Views
  • 6 replies
  • 1 kudos

[RETRIES_EXCEEDED] Error When Displaying DataFrame in Databricks Using Serverless Compute

Hi Databricks Community,I am encountering an issue when trying to display a DataFrame in a Python notebook using serverless compute. The operation seems to fail after several retries, and I get the following error message:[RETRIES_EXCEEDED] The maxim...

  • 4412 Views
  • 6 replies
  • 1 kudos
Latest Reply
sridharplv
Valued Contributor
  • 1 kudos

Hi @arjunraja_azure , Below is the best version for your code which will avoid failure:from pyspark.sql.functions import maxdf = spark.read.table('workspace.default.emp')df1 = df.agg(max('sal')) # Aggregate in separate step and also avoid caching bef...

  • 1 kudos
5 More Replies
JameDavi_51481
by Contributor
  • 44 Views
  • 1 replies
  • 0 kudos

making REORG TABLE to enable Iceberg Uniform more efficient and faster

I am upgrading a large number of tables for Iceberg / Uniform compatibility by running REORG TABLE <tablename> APPLY (UPGRADE UNIFORM(ICEBERG_COMPAT_VERSION=2));and finding that some tables take several hours to upgrade - presumably because they are ...

  • 44 Views
  • 1 replies
  • 0 kudos
Latest Reply
sridharplv
Valued Contributor
  • 0 kudos

HI @JameDavi_51481 , Hope you tried this approach for enabling iceberg metadata along with delta format :ALTER TABLE internal_poc_iceberg.iceberg_poc.clickstream_gold_sink_dltSET TBLPROPERTIES ('delta.columnMapping.mode' = 'name','delta.enableIceberg...

  • 0 kudos
vsam
by New Contributor
  • 244 Views
  • 5 replies
  • 2 kudos

Optimize taking FULL Taking Longer time on Clustered Table

Hi Everyone, Currently we are facing issue with OPTIMIZE table_name FULL operation. The dataset consists of 150 billion rows of data and it takes 8 hours to optimize the reloaded clustered table. The table is refreshed every month and it needs cluste...

  • 244 Views
  • 5 replies
  • 2 kudos
Latest Reply
sridharplv
Valued Contributor
  • 2 kudos

Hi @vsam , Have you tried the Auto liquid clustering with Predictive optimization enabled where you don't need to mention cluster by columns specifically and also the optimization will be handled in the backend by predictive optimization concept.http...

  • 2 kudos
4 More Replies
ShivangiB1
by New Contributor II
  • 270 Views
  • 7 replies
  • 3 kudos

AI/BI dashboard integration in HTML page

Hey Team,Have below queries :I want to test AI/BI dashboard embeding and want to embed in an html page I have created, but  dashboard is not getting loaded can you please help in understanding process.And also I want user to have access even if they ...

  • 270 Views
  • 7 replies
  • 3 kudos
Latest Reply
v-kzaffer
New Contributor II
  • 3 kudos

Yes, you can and should use a service principal to validate and embed your Databricks AI/BI dashboard in an external website like SharePoint. This is a more secure and robust method than using your personal credentials, especially in a production env...

  • 3 kudos
6 More Replies
james698henry
by Visitor
  • 94 Views
  • 1 replies
  • 0 kudos

The Rise of Citizen Science: Empowering Public Participation in Research

Hello,Citizen science is transforming how research is conducted by inviting everyday people to contribute to scientific discovery. From tracking wildlife migrations to analyzing space data, volunteers are helping scientists gather and interpret massi...

  • 94 Views
  • 1 replies
  • 0 kudos
Latest Reply
BS_THE_ANALYST
Contributor III
  • 0 kudos

@james698henry, this feels more like a self-promotion using AI generated content rather than anything actionable or useful to the Data Engineering discussions forum.Please elaborate on how the information you've provided lends itself to Data Engineer...

  • 0 kudos
Sneeze7432
by New Contributor
  • 276 Views
  • 13 replies
  • 2 kudos

File Trigger Not Triggering Multiple Runs

I have a job with one task which is to run a notebook.  The job run is setup with a File arrival trigger with my blob storage as the location.  The trigger works and the job will start when a new file arrives in the location, but it does not run for ...

  • 276 Views
  • 13 replies
  • 2 kudos
Latest Reply
nayan_wylde
Valued Contributor III
  • 2 kudos

@Sneeze7432 you can also try editing the max concurrent runs in the workflow. 

  • 2 kudos
12 More Replies
glevin1
by New Contributor
  • 567 Views
  • 1 replies
  • 0 kudos

API response code when running a new job

We are attempting to use the POST /api/2.2/jobs/run-now endpoint using oAuth 2.0 Client Credentials authentication.We are finding that when sending a request with an expired token, we receive a HTTP code of 400. This contradicts the documentation on ...

  • 567 Views
  • 1 replies
  • 0 kudos
Latest Reply
v-kzaffer
New Contributor II
  • 0 kudos

Hello gelvinPlease raise the ticket using this lik  https://help.databricks.com/s/contact-us?ReqType=training Please explain the issue clearly so that it will be easy for supoort team to help easily.

  • 0 kudos
manish1987c
by New Contributor III
  • 1654 Views
  • 4 replies
  • 1 kudos

Delta Live Table - Flow detected an update or delete to one or more rows in the source table

I have create a pipeline where i am ingesting the data from bronze to silver and using SCD 1, however when i am trying to create gold table as dlt it is giving me error as "Flow 'user_silver' has FAILED fatally. An error occurred because we detected ...

manish1987c_0-1718341166099.png manish1987c_1-1718341206991.png
  • 1654 Views
  • 4 replies
  • 1 kudos
Latest Reply
Pat
Esteemed Contributor
  • 1 kudos

Streaming tables in Delta Live Tables (DLT) only support append-only operations in the SOURCE.The error occurs because:1. Your silver table uses SCD Type 1, which performs UPDATE and DELETE operations on existing records2. Your gold table is defined ...

  • 1 kudos
3 More Replies
allyallen
by New Contributor
  • 167 Views
  • 1 replies
  • 0 kudos

Variable Compute clusters within a Job

We have 3 possible compute clusters that we can run a notebook against.They are varying sizes and the one that the notebook uses will depend on the size of the data being processed.We "t-shirt size" each tenant base on their data size (S, M, L) and c...

  • 167 Views
  • 1 replies
  • 0 kudos
Latest Reply
eniwoke
New Contributor III
  • 0 kudos

Hi @allyallen, just to clarify your use case to see if I can provide a solution:Are you saying you have a single job with multiple tasks, and each of those tasks runs the same notebook (e.g., notebook_1), but you'd like the compute cluster to vary de...

  • 0 kudos
Dimitry
by New Contributor III
  • 78 Views
  • 3 replies
  • 0 kudos

Problem activating File Events for External Location / ADLS V2

Hi allI've followed the book for creating external location for Azure Data Lake Storage (ADLS V2) using account connectorI've granted all required permissions to the connector:I've created a "stock" container on that above mentioned "devtyremeshare" ...

Dimitry_0-1752217945550.png Dimitry_1-1752218028803.png Dimitry_4-1752218304084.png Dimitry_2-1752218084368.png
  • 78 Views
  • 3 replies
  • 0 kudos
Latest Reply
v-kzaffer
New Contributor II
  • 0 kudos

I am 1000% sure that your evengrid is not registered. 

  • 0 kudos
2 More Replies
carolregatt
by New Contributor
  • 253 Views
  • 2 replies
  • 1 kudos

Resolved! Databricks Asset Bundle wrongfully deleting job

Hey so Ive just started to use DAB to automatically mange job configs via CICD I had a previously-existing job (lets say ID 123) which was created manually and had this config resources:  jobs:    My_Job_A:      name: My Job A And I wanted to automat...

  • 253 Views
  • 2 replies
  • 1 kudos
Latest Reply
carolregatt
New Contributor
  • 1 kudos

Thanks so much for the response @Advika !That makes sense!Can you explain why the remote config had a different key when compared to the local one? I guess that was what threw me off and made me want to change the local key to match the remote

  • 1 kudos
1 More Replies
Hoviedo
by New Contributor III
  • 865 Views
  • 4 replies
  • 0 kudos

Apply expectations only if column exists

Hi, is there any way to apply a expectations only if that column exists? I am creating multiple dlt tables with the same python function so i would like to create diferent expectations based in the table name, currently i only can create expectations...

  • 865 Views
  • 4 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

To apply expectations only if a column exists in Delta Live Tables (DLT), you can use the @Dlt.expect decorator conditionally within your Python function. Here is a step-by-step approach to achieve this: Check if the Column Exists: Before applying th...

  • 0 kudos
3 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels