Data Engineering

Forum Posts

Sorted by:

by Sadam97 • New Contributor III

08-05-2025 2:00:05 AM

782 Views
2 replies
1 kudos

databricks job cancel does not wait for termination of streaming tasks

We have created databricks jobs and each has multiple tasks. Each task is 24/7 running streaming with checkpoint enabled. We want it to be stateful when cancel and run the job but it seems like, when we cancel the job run it kill the parent process a...

Data Engineering

782 Views
2 replies
1 kudos

08-05-2025 2:00:05 AM

View Replies

Latest Reply

Vidhi_Khaitan
Databricks Employee

08-05-2025 3:57:40 AM

1 kudos

If the “reporting” layer is essentially micro-batching over bounded backlogs, run it with availableNow (or a scheduled batch job) so each run is naturally bounded and exits cleanly on its own, no manual cancel. This greatly reduces chances of partial...

1 kudos

08-05-2025 3:57:40 AM

1 More Replies

by Srajole • New Contributor

08-04-2025 7:55:11 PM

857 Views
1 replies
1 kudos

Write data issue

My Databricks job is completing successful but my data is not written into the target table, source path is correct, each n every thing is correct, but I am not sure y data is not written into the delta table.

Data Engineering

857 Views
1 replies
1 kudos

08-04-2025 7:55:11 PM

View Replies

Latest Reply

Vidhi_Khaitan
Databricks Employee

08-05-2025 3:53:57 AM

1 kudos

hi @Srajole ,There are a bunch of possibilities as to why the data is not being written into the table -You’re writing to a path different from the table’s storage location, or using a write mode that doesn’t replace data as expected.spark.sql("DESCR...

1 kudos

08-05-2025 3:53:57 AM

by dbr_data_engg • New Contributor III

06-30-2025 12:42:16 AM

2121 Views
2 replies
0 kudos

Using Databrick Bladebridge or Lakebridge for SQL Migration

Getting Transpile Error while executing command for Databrick Bladebridge or Lakebridge,databricks labs lakebridge transpile --source-dialect mssql --input-source "<Path>/sample.sql" --output-folder "<Path>\output"Error :TranspileError(code=FAILURE, ...

Data Engineering

2121 Views
2 replies
0 kudos

06-30-2025 12:42:16 AM

View Replies

Latest Reply

Abhimanyu
Databricks Partner

08-05-2025 3:43:15 AM

0 kudos

did you find a solution?

0 kudos

08-05-2025 3:43:15 AM

1 More Replies

by juanjomendez96 • Contributor

08-05-2025 12:43:24 AM

1328 Views
2 replies
3 kudos

Resolved! Best practices for compute usage

Hello there!I am writing this open message to know how you guys are using the computes in your work cases.Currently, in my company, we have multiple compute instances that can be differentiated into two main types:Clusters with a large instance for b...

Data Engineering

1328 Views
2 replies
3 kudos

08-05-2025 12:43:24 AM

View Replies

Latest Reply

radothede
Valued Contributor II

08-05-2025 2:05:19 AM

3 kudos

Hello @juanjomendez96 ,to my best knowledge and experience autoscaled shared cluster (using smaller instances) works good for most 2nd-case scenario (clusters for ad-hoc/development team usage).This approach allows You to reuse the resources across t...

3 kudos

08-05-2025 2:05:19 AM

1 More Replies

by VicS • Databricks Partner

08-05-2025 1:07:36 AM

1575 Views
1 replies
1 kudos

Resolved! How to install SAP JDBC on job cluster via asset bundles

I'm trying to use the SAP JDBC driver to read data in my Spark application which I deploy via asset bundles with job computes.I was able to install the SAP JDBC Driver on a general purpose cluster by adding the jar (com.sap.cloud.db.jdbc:ngdbc:2.25.9...

Data Engineering

1575 Views
1 replies
1 kudos

08-05-2025 1:07:36 AM

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

08-05-2025 1:15:15 AM

1 kudos

Hi @VicS ,To add a Maven package to a job task definition , in libraries, specify a maven mapping for each Maven package to be installed. For each mapping, specify the following: resources: jobs: my_job: # ... tasks: - task_...

1 kudos

08-05-2025 1:15:15 AM

by abueno • Contributor

08-04-2025 6:48:23 PM

1089 Views
1 replies
1 kudos

Resolved! Python If Statement with multiple "and" conditions, if not default column value

Python 3.10.12 I am trying to get these filter results: example:If "column1" = '2024' and column2 in ('DE','SC') then 'value1" else 'value2'If "column1" = "2023" and column2 in ('DE,'SC')then "value3 else "value4"if the row/record does not fit the cr...

Data Engineering

1089 Views
1 replies
1 kudos

08-04-2025 6:48:23 PM

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

08-05-2025 12:23:23 AM

1 kudos

Hi @abueno .I'm assuming you're asking how to do this in pyspark. You can use when and otherwise conditional functions to achieve your expected result:from pyspark.sql import SparkSession from pyspark.sql.types import StructType, StructField, StringT...

1 kudos

08-05-2025 12:23:23 AM

by pogo • New Contributor III

08-04-2025 2:30:42 PM

577 Views
1 replies
1 kudos

Resolved! Cognito as IdP provider for Delta Share

I am trying to setup a delta sharing Recipient using OIDC Federation with the Issuer URL being cognito idp endpoint.Are there any examples, other than EntraID, for the values of Subject Claim/Subject/Audiences in the OIDC Policy for Cognito or Google...

Data Engineering

577 Views
1 replies
1 kudos

08-04-2025 2:30:42 PM

View Replies

Latest Reply

pogo
New Contributor III

08-04-2025 4:51:54 PM

1 kudos

We managed to figure how to make machine to machine authentication to work.when you setup cognito pool for m2m scenario you add App Clientand then set App Client as both `sub` and Audience in databricks recepient OIDC Policy:2. Set `aud` claim to the...

1 kudos

08-04-2025 4:51:54 PM

by LakehouseOMG14 • New Contributor II

07-09-2025 1:58:42 AM

4271 Views
7 replies
3 kudos

Resolved! Salesforce with Databricks connectivity

Can we connect Salesforce with Databricks.I want to do both push and pull activity using Databricks and Salesforce.Do There any challenge while using ODBC?Please help me with detailed approach.Thanks a ton.

Data Engineering

4271 Views
7 replies
3 kudos

07-09-2025 1:58:42 AM

View Replies

Latest Reply

ManojkMohan
Honored Contributor II

08-04-2025 3:36:32 PM

3 kudos

Have resolved it Step 1: Create a Service PrincipalLog in to your Databricks Workspace and navigate to the Admin Settings page by clicking your email in the bottom-left corner and selecting "Admin Settings".Go to the Identity and access tab and click...

3 kudos

08-04-2025 3:36:32 PM

6 More Replies

by ChristianRRL • Honored Contributor

06-23-2025 11:30:16 AM

1456 Views
2 replies
1 kudos

Autoloader Error Loading and Displaying

Hi there,I'd appreciate some assistance with troubleshooting what is supposed to be a (somewhat) simple use of autoloader. Below are some screenshots highlighting my issue:When I attempt to create the dataframe via spark.readStream.format("cloudFiles...

Data Engineering

1456 Views
2 replies
1 kudos

06-23-2025 11:30:16 AM

View Replies

Latest Reply

lingareddy_Alva
Esteemed Contributor

06-23-2025 1:22:35 PM

1 kudos

Hi @ChristianRRL This is a common issue with Spark Structured Streaming and the display() function.The error occurs because you're trying to display a streaming DataFrame, which requires special handling. Here are several solutions:1. Use writeStrea...

1 kudos

06-23-2025 1:22:35 PM

1 More Replies

by Karl • New Contributor II

08-28-2024 8:12:07 AM

8733 Views
2 replies
0 kudos

Resolved! DB2 JDBC Connection from Databricks cluster

Has anyone successfully connected to a DB2 database on ZOS from a Databricks cluster using a JDBC connection?I also need to specify an SSL certificate path and not sure if I need to use an init script on the cluster to do so.Any examples would be ver...

Data Engineering

8733 Views
2 replies
0 kudos

08-28-2024 8:12:07 AM

View Replies

Latest Reply

Ayushi_Suthar
Databricks Employee

02-04-2025 12:03:13 AM

0 kudos

Hi @Karl , Greetings! I've outlined the steps below to connect from Databricks to IBM DB2 using JDBC:Step 1: Obtain the DB2 JDBC Driver Visit the IBM website to download the appropriate JDBC driver for DB2 on z/OS.Reference Document: IBM DB2 JDBC Dr...

0 kudos

02-04-2025 12:03:13 AM

1 More Replies

by pogo • New Contributor III

08-03-2025 8:25:29 AM

1058 Views
2 replies
2 kudos

Resolved! Delta sharing to pandas error

we are on a trial databricks premium workspace (fully managed by databricks)we are trying to test Delta Sharing feature, where we are sharing UC table with a recipient using python client (outside of databricks). We are using `delta-sharing` python l...

Data Engineering

1058 Views
2 replies
2 kudos

08-03-2025 8:25:29 AM

View Replies

Latest Reply

pogo
New Contributor III

08-04-2025 2:24:53 PM

2 kudos

Yes - you are right, works on trial - i was able to setup S3 as an external location and configured the UC schema to use this S3 ext location, and then was able to query data from an external python client.

2 kudos

08-04-2025 2:24:53 PM

1 More Replies

by thiagoawstest • Contributor

06-10-2024 7:05:58 AM

4963 Views
3 replies
0 kudos

create databricks scope by reading AWS secrets manager

Hi, I have datbricks on AWS, I created some secrets in AWS Secrets Manger, I would need to create the scopes based on AWS secrets manager.When I use Azure's Key Vault, when creating the scope, it uses the option -scope-backend-type AZURE_KEYVAULT, bu...

Data Engineering

AWS

4963 Views
3 replies
0 kudos

06-10-2024 7:05:58 AM

View Replies

Latest Reply

Yeshwanth
Databricks Employee

06-10-2024 7:15:09 AM

0 kudos

Hi @thiagoawstest Step 1: Create Secret ScopeYou can create a secret scope using the Databricks REST API as shown below: python import requests import json # Define the endpoint and headers url = "https://<databricks-instance>/api/2.0/secrets/scope...

0 kudos

06-10-2024 7:15:09 AM

2 More Replies

by SusmithaBadam • New Contributor II

08-04-2025 2:27:20 AM

1065 Views
1 replies
0 kudos

Liquid clustering not improved performance

Hi There,I have a table of 160 GB with partition applied on country and yearmonth columns, I maintain a previous history of 6 years and replace the partitions (latest 2 months) to add the new data.I use overwrite mode to replace the effected partitio...

Data Engineering

1065 Views
1 replies
0 kudos

08-04-2025 2:27:20 AM

View Replies

Latest Reply

Renu_
Valued Contributor II

08-04-2025 7:36:25 AM

0 kudos

Hi @SusmithaBadam, based on your use case, partitioned tables are performing better because they work kind of like labeled folders. When you group by, it can quickly go to the exact folder instead of scanning everything, so it’s much faster.Liquid cl...

0 kudos

08-04-2025 7:36:25 AM

by Suki • New Contributor III

08-04-2025 4:49:45 AM

1243 Views
2 replies
0 kudos

Issue with Resetting Checkpoint Metadata in DLT with Unity Catalog

Hi Community,Hope someonne can help with this DLT question.I am currently working in a Databricks environment using Delta Live Tables (DLT) with Unity Catalog enabled, and I'm encountering a blocker related to schema evolution and checkpoint metadata...

Data Engineering

1243 Views
2 replies
0 kudos

08-04-2025 4:49:45 AM

View Replies

Latest Reply

T0M
Contributor

08-04-2025 6:26:24 AM

0 kudos

I feel you.Probably not the way to go, but did to try to destroy and re-deploy your pipeline?

0 kudos

08-04-2025 6:26:24 AM

1 More Replies

by Mohan_Baabu1 • New Contributor III

08-01-2025 7:26:37 AM

4383 Views
4 replies
3 kudos

Resolved! Best Practices for Designing Bronze Layer with SQL Server Source in Medallion Architecture

Hi Databricks Experts,I'm working on a Medallion Architecture implementation in Databricks, where the source data is coming from SQL Server. I would like some advice on how to handle the bronze layer correctly and cost-effectively.Should I create a b...

Data Engineering

4383 Views
4 replies
3 kudos

08-01-2025 7:26:37 AM

View Replies

Latest Reply

pgo
New Contributor III

08-03-2025 11:34:08 PM

3 kudos

Create the bronze table using Auto Loader and store it in Delta format. Although it might seem like you'll only read from bronze once to populate the silver layer, in real-world production scenarios, you'll often need to re-read from bronze—for repro...

3 kudos

08-03-2025 11:34:08 PM

3 More Replies

Databricks Community

Forum Posts

databricks job cancel does not wait for termination of streaming tasks

Write data issue

Using Databrick Bladebridge or Lakebridge for SQL Migration

Resolved! Best practices for compute usage

Resolved! How to install SAP JDBC on job cluster via asset bundles

Resolved! Python If Statement with multiple "and" conditions, if not default column value

Resolved! Cognito as IdP provider for Delta Share

Resolved! Salesforce with Databricks connectivity

Autoloader Error Loading and Displaying

Resolved! DB2 JDBC Connection from Databricks cluster

Resolved! Delta sharing to pandas error

create databricks scope by reading AWS secrets manager

Liquid clustering not improved performance

Issue with Resetting Checkpoint Metadata in DLT with Unity Catalog

Resolved! Best Practices for Designing Bronze Layer with SQL Server Source in Medallion Architecture

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template