Data Engineering

Forum Posts

Sorted by:

by ManojkMohan • Honored Contributor II

08-06-2025 9:01:43 AM

446 Views
1 replies
2 kudos

Resolved! Sample Data Reflecting but Uploaded File reflecting

Step1: I uploaded CSV file manually in data bricks Step 2: Connector created and active between Salesforce and DatabricksStep 3: Creating Data Streams in Salesforce Data CloudSample Topics are reflecting , matching between what i see in data bricks ...

Data Engineering

446 Views
1 replies
2 kudos

08-06-2025 9:01:43 AM

View Replies

Latest Reply

ManojkMohan
Honored Contributor II

08-06-2025 1:07:12 PM

2 kudos

I resolved it myselfStep1: workspace --> manage permissions step 2: chose all permissionsstep 3: went to raw uploaded file and share via delta sharingStep4: in salesforce data stream i got the raw file

2 kudos

08-06-2025 1:07:12 PM

by Shruti12 • Databricks Partner

08-06-2025 6:30:20 AM

2616 Views
2 replies
1 kudos

Databricks support updating multiple target rows with single matching source row in merge query?

Hi,I am getting this error in merge statement. DeltaUnsupportedOperationException: Cannot perform Merge as multiple source rows matched and attempted to modify the same target row in the Delta table in possibly conflicting ways.Does Databricks suppor...

Data Engineering

2616 Views
2 replies
1 kudos

08-06-2025 6:30:20 AM

View Replies

Latest Reply

Shruti12
Databricks Partner

08-06-2025 10:49:49 AM

1 kudos

Hi @szymon_dybczak ,Thanks for your reply. The above code is working fine which means multiple updates can be done from a single source target. So, it may be when there are complex matching conditions/values, merge query gives error.I cannot send you...

1 kudos

08-06-2025 10:49:49 AM

1 More Replies

by arsamkull • New Contributor III

08-23-2022 11:47:54 PM

8081 Views
6 replies
6 kudos

Usage of Azure DevOps System.AccessToken as PAT in Databricks

Hi there! I'm trying to use Azure DevOps Pipeline to automate Azure Databricks Repos API. Im using the following workflow:Get an Access Token for a Databricks Service Principal using a Certificate (which works great)Usage REST Api to generate Git Cre...

Data Engineering

8081 Views
6 replies
6 kudos

08-23-2022 11:47:54 PM

View Replies

Latest Reply

Srihasa_Akepati
Databricks Employee

01-27-2023 12:07:40 AM

6 kudos

@Adrian Ehrsam The PAT limit has been increased to 2048 now. Please check.

6 kudos

01-27-2023 12:07:40 AM

5 More Replies

by filipniziol • Esteemed Contributor

08-06-2025 2:01:48 AM

1382 Views
1 replies
2 kudos

Merge slows down when the table grows with liquid clustering enabled.

Hi Everyone, I have a source table and target table and MERGE statement that is inserting/updating records every couple of minutes. The clustering keys are set up to match the 2 merge join columns.I noticed that with time the processing time increase...

Data Engineering

1382 Views
1 replies
2 kudos

08-06-2025 2:01:48 AM

View Replies

Latest Reply

kerem
Contributor

08-06-2025 3:10:19 AM

2 kudos

Hi @filipniziol ,I dealt with a large table of about a TB in size with liquid clustering enabled. Even with Liquid Clustering, selects and joins on the clustered columns took longer as the table grew. So I don't think it performs as fast as the table...

2 kudos

08-06-2025 3:10:19 AM

by vamsi_simbus • Databricks Partner

08-04-2025 6:29:50 AM

1692 Views
5 replies
0 kudos

Databricks System Table system.billing.usage Not Capturing Job Data in Real-Time

We’ve observed that the system.billing.usage table in Databricks is not capturing job usage data in real-time. There appears to be a noticeable delay between when jobs are executed and when their corresponding usage records appear in the system table...

Data Engineering

1692 Views
5 replies
0 kudos

08-04-2025 6:29:50 AM

View Replies

Latest Reply

vamsi_simbus
Databricks Partner

08-05-2025 11:04:41 PM

0 kudos

Hi @szymon_dybczak ,Is there any alternative approach to find the DBU usage of current running jobs ?

0 kudos

08-05-2025 11:04:41 PM

4 More Replies

by malla_aayush • Databricks Partner

08-05-2025 8:22:03 AM

809 Views
2 replies
1 kudos

Resolved! Not able to find lab for Data Engineering Learning Path

I am not able to find the data engineering learning path , i did open partner databricks academy lab which redirected to uplimit where i also enrolled myself to instructor led course but not able to see any labs.

Data Engineering

809 Views
2 replies
1 kudos

08-05-2025 8:22:03 AM

View Replies

Latest Reply

junaid-databrix
New Contributor III

08-06-2025 12:28:15 AM

1 kudos

You are right the self paced e-learning courses does not include any labs. However, they are available on instructor led courses available on Uplimit. I recently enrolled for one and here is how it worked for me:1. On Uplimit portal enroll for an upc...

1 kudos

08-06-2025 12:28:15 AM

1 More Replies

by susanne • Databricks Partner

08-04-2025 5:05:54 AM

1632 Views
3 replies
0 kudos

Resolved! Authentication failure Lakeflow SQL Server Ingestion

Hi all I am trying to create a Lakeflow Ingestion Pipeline for SQL Server, but I am running into the following authentication error when using my Databricks Database User for the connection:Gateway is stopping. Authentication failure while obtaining ...

Data Engineering

1632 Views
3 replies
0 kudos

08-04-2025 5:05:54 AM

View Replies

Latest Reply

susanne
Databricks Partner

08-06-2025 12:20:10 AM

0 kudos

Hi @szymon_dybczak,thanks a lot, that did the trick

0 kudos

08-06-2025 12:20:10 AM

2 More Replies

by Alena • New Contributor II

08-05-2025 1:41:55 PM

700 Views
1 replies
0 kudos

Programmatically set minimum workers for a job cluster based on file size?

I’m running an ingestion pipeline with a Databricks job:A file lands in S3A Lambda is triggeredThe Lambda runs a Databricks jobThe incoming files vary a lot in size, which makes processing times vary as well. My job cluster has autoscaling enabled, b...

Data Engineering

700 Views
1 replies
0 kudos

08-05-2025 1:41:55 PM

View Replies

Latest Reply

kerem
Contributor

08-05-2025 4:27:56 PM

0 kudos

Hi Alena, Jobs API has update functionality to be able to do that: https://docs.databricks.com/api/workspace/jobs_21/updateIf for some reason you can’t update your pipeline before you trigger it you can also consider creating a new job with desired c...

0 kudos

08-05-2025 4:27:56 PM

by Nick_Pacey • New Contributor III

08-01-2025 8:30:51 AM

906 Views
2 replies
0 kudos

Question on best method to deliver Azure SQL Server data into Databricks Bronze and Silver.

Hi,We have a Azure SQL Server (replicating from an On Prem SQL Server) that is required to be in Databricks bronze and beyond.This database has 100s of tables that are all required. Size of tables will vary from very small up to the biggest tables 1...

Data Engineering

906 Views
2 replies
0 kudos

08-01-2025 8:30:51 AM

View Replies

Latest Reply

kerem
Contributor

08-05-2025 4:08:50 PM

0 kudos

Hey Nick,Have you tried the SQL Server connector with Lakeflow Connect? This should provide native connection to your SQL server, potentially allowing for incremental updates and CDC setup. https://learn.microsoft.com/en-us/azure/databricks/ingestion...

0 kudos

08-05-2025 4:08:50 PM

1 More Replies

by yit • Databricks Partner

08-05-2025 2:19:13 PM

558 Views
1 replies
0 kudos

Unable to Upcast DECIMAL Field in Autoloader

I’m using Autoloader to read Parquet files and write them to a Delta table. I want to enforce a schema in which Column1 is defined as DECIMAL(10,2). However, in the Parquet files being ingested, Column1 is defined as DECIMAL(8,2).When Autoloader read...

Data Engineering

558 Views
1 replies
0 kudos

08-05-2025 2:19:13 PM

View Replies

Latest Reply

kerem
Contributor

08-05-2025 4:02:24 PM

0 kudos

Hi Yit,To potentially simplify your issue, why not read this column as String in your stream and then cast it to DECIMAL(10, 2) afterwards? That should eliminate the rescue behaviour. Kerem Durak

0 kudos

08-05-2025 4:02:24 PM

by ManojkMohan • Honored Contributor II

08-05-2025 2:46:11 PM

629 Views
2 replies
0 kudos

Resolved! Compute kind SERVERLESS_REPL_VM is not allowed to use cluster scoped libraries.

i have a s3 uri 's3://salesforcedatabricksorders/orders_data.xlsx' , i have created a connector between data bricks and salesfoce, i am first gettig the orders_data.xlsx to databricks layer perform basic transformation on it and then send it to sales...

Data Engineering

629 Views
2 replies
0 kudos

08-05-2025 2:46:11 PM

View Replies

Latest Reply

kerem
Contributor

08-05-2025 3:54:20 PM

0 kudos

Hello,I’ve come across the same issue reading an Excel file into a PySpark dataframe via Serverless compute. As the error states with Serverless, you cannot install a cluster scoped library so you have to use notebook scoped libraries (%pip install…)...

0 kudos

08-05-2025 3:54:20 PM

1 More Replies

by Pratikmsbsvm • Contributor

08-05-2025 8:57:23 AM

1296 Views
1 replies
1 kudos

Resolved! How to Create Metadata driven Data Pipeline in Databricks

I am creating a Data Pipeline as shown below.1. Files from multiple input source is coming to respective folder in bronze layer.2. Using Databricks to perform Transformation and load transformed data to Azure SQL. also to ADLS Gen2 Silver (not shown ...

Data Engineering

1296 Views
1 replies
1 kudos

08-05-2025 8:57:23 AM

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

08-05-2025 10:14:00 AM

1 kudos

Hi @Pratikmsbsvm ,It's totally realistic requirement. In fact you can find many articles that suggests some approaches how to design such control table. Take for example following article: https://medium.com/dbsql-sme-engineering/a-primer-for-metadat...

1 kudos

08-05-2025 10:14:00 AM

by drag7ter • Contributor

05-05-2025 10:14:56 AM

1941 Views
2 replies
0 kudos

how to restrict creation serving endpoints in databricks to a user

Is it possible somehow to restrict creation of the serving endpoints to specific users? I want to grant Workspace access under the Entitlements of the specific group, but I want not to allow users of this group create serving endpoints.The only way I...

Data Engineering

1941 Views
2 replies
0 kudos

05-05-2025 10:14:56 AM

View Replies

by Sainath368 • Contributor

08-04-2025 5:20:43 AM

954 Views
1 replies
1 kudos

Resolved! How to Retrieve the spark.statistics.createdAt When Statistics Were Last Updated in Databricks?

Hi everyone,I regularly (once a week) run the analyze table compute statistics command on all my tables in Databricks to keep statistics up to date for query optimization.In the Spark table UI catalog, I can see some statistics metadata like spark.st...

Data Engineering

954 Views
1 replies
1 kudos

08-04-2025 5:20:43 AM

View Replies

Latest Reply

Advika
Community Manager

08-05-2025 6:52:15 AM

1 kudos

Hello @Sainath368! sql.statistics.createdAt reflects the epoch time when statistics were created. Unfortunately, there's no direct command available to check when the statistics were last updated. As a workaround, you can manually set the current tim...

1 kudos

08-05-2025 6:52:15 AM

by Itai_Sharon • New Contributor II

08-03-2025 3:59:41 AM

1313 Views
3 replies
1 kudos

dbutils.notebook.run() getting general error instead specific

Hi, In a python file using dbutils.notebook.run() I'm running specific notebook.The notebook is failling but i'm getting a general error log instead the real specific log.When I'm running the notebook directly - I'm getting the specific error log.gen...

Data Engineering

1313 Views
3 replies
1 kudos

08-03-2025 3:59:41 AM

View Replies

Latest Reply

Itai_Sharon
New Contributor II

08-05-2025 4:41:17 AM

1 kudos

@Vinay_M_RBTW, when trying to run a job using Databricks API, I encounter the same issue (general "FAILED: Workload failed"):from databricks.sdk import WorkspaceClient client = WorkspaceClient() run = client.jobs.run_now(job_id) error message:state_...

1 kudos

08-05-2025 4:41:17 AM

2 More Replies

Databricks Community

Forum Posts

Resolved! Sample Data Reflecting but Uploaded File reflecting

Databricks support updating multiple target rows with single matching source row in merge query?

Usage of Azure DevOps System.AccessToken as PAT in Databricks

Merge slows down when the table grows with liquid clustering enabled.

Databricks System Table system.billing.usage Not Capturing Job Data in Real-Time

Resolved! Not able to find lab for Data Engineering Learning Path

Resolved! Authentication failure Lakeflow SQL Server Ingestion

Programmatically set minimum workers for a job cluster based on file size?

Question on best method to deliver Azure SQL Server data into Databricks Bronze and Silver.

Unable to Upcast DECIMAL Field in Autoloader

Resolved! Compute kind SERVERLESS_REPL_VM is not allowed to use cluster scoped libraries.

Resolved! How to Create Metadata driven Data Pipeline in Databricks

how to restrict creation serving endpoints in databricks to a user

Resolved! How to Retrieve the spark.statistics.createdAt When Statistics Were Last Updated in Databricks?

dbutils.notebook.run() getting general error instead specific

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template