Data Engineering

Forum Posts

Sorted by:

by Sadam97 • New Contributor III

08-05-2025 12:55:06 AM

1013 Views
1 replies
0 kudos

cancel running job kill the parent process and does not wait for streamings to stop

Hi,We have created databricks jobs and each has multiple tasks. Each task is 24/7 running streaming with checkpoint enabled. We want it to be stateful when cancel and run the job but it seems like, when we cancel the job run it kill the parent proces...

Data Engineering

1013 Views
1 replies
0 kudos

08-05-2025 12:55:06 AM

View Replies

Latest Reply

Sidhant07
Databricks Employee

08-26-2025 3:23:16 AM

0 kudos

Hi @Sadam97 , This seems to be expected behaviour. If you are running the jobs in a job cluster: In job clusters, the Databricks job scheduler treats all streaming queries within a task as belonging to the same job execution context. If any query fai...

0 kudos

08-26-2025 3:23:16 AM

by mkEngineer • New Contributor III

08-25-2025 4:14:54 AM

1190 Views
6 replies
2 kudos

How to preserve job run history when deploying with DABs

HiI’m having an issue when deploying jobs with DABs. Each time I deploy changes, the existing job gets overwritten, the job name stays the same, but a new job ID is created. This causes the history of past runs to be lost.Ideally, I’d like to update...

Data Engineering

1190 Views
6 replies
2 kudos

08-25-2025 4:14:54 AM

View Replies

Latest Reply

Coffee77
Honored Contributor II

08-26-2025 3:13:03 AM

2 kudos

Despite using different keys but same names, original jobs should remain indeed unless destroying them

2 kudos

08-26-2025 3:13:03 AM

5 More Replies

by echozhuoocl • New Contributor II

08-25-2025 10:05:41 PM

630 Views
2 replies
0 kudos

delta sharing presigned url was removed, what should I do?

Caused by: java.lang.IllegalStateException: table s3a://dmsa/tmp/the_credential_of_deltasharing/on_prem_deltasharing.share#on-prem-delta-sharing.dmsa_in_nrt.shp_rating_snapshot was removed at org.apache.spark.delta.sharing.CachedTableManager.getPre...

Data Engineering

630 Views
2 replies
0 kudos

08-25-2025 10:05:41 PM

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

08-25-2025 10:22:09 PM

0 kudos

Hi @echozhuoocl ,Did you VACCUM your table? If you're not sure, run:DESCRIBE HISTORY catalog.schema.table

0 kudos

08-25-2025 10:22:09 PM

1 More Replies

by Puru20 • New Contributor III

08-24-2025 5:48:19 AM

1193 Views
3 replies
6 kudos

Resolved! Pass the job even if specific task fails

Hi , I have multiple data pipelines and each has data quality check as a final task which runs on dbt. There are 1500 test cases altogether runs everyday which is being captured on dashboard. Is there a way to pass the job even if this particular tal...

Data Engineering

1193 Views
3 replies
6 kudos

08-24-2025 5:48:19 AM

View Replies

Latest Reply

Puru20
New Contributor III

08-25-2025 10:36:29 PM

6 kudos

Hi @szymon_dybczak The solution works perfectly when I set leaf job to pass irrespective of dbt test task status. Thanks much!

6 kudos

08-25-2025 10:36:29 PM

2 More Replies

by ismaelhenzel • Contributor III

08-25-2025 11:09:44 AM

852 Views
1 replies
0 kudos

Schema Evolution/Type Widening in Materialized Views

My team is migrating pipelines from Spark to Delta Live Tables (DLT), but we've found that some important features, like schema evolution for tables with enforced schemas, seem to be missing. In DLT, we can define schemas, set primary and foreign key...

Data Engineering

852 Views
1 replies
0 kudos

08-25-2025 11:09:44 AM

View Replies

Latest Reply

nayan_wylde
Esteemed Contributor II

08-25-2025 12:52:44 PM

0 kudos

DLT supports schema evolution, but changing column data types (like from DECIMAL(10,5) to DECIMAL(11,5)) is not automatically handled. Here's how you can manage it:Option 1: Full Refresh with Schema UpdateIf you're okay with refreshing the materializ...

0 kudos

08-25-2025 12:52:44 PM

by zc • New Contributor III

08-22-2025 10:19:49 AM

7402 Views
9 replies
7 kudos

Resolved! Use Array in WHERE IN clause

This is what I'm trying to do using SQL: create table check1 asselect * from dataAwhere IDs in ('12483258','12483871','12483883'); The list of IDs is much longer and may be changed so I want to use a variable for that. This is what I have tried decla...

Data Engineering

7402 Views
9 replies
7 kudos

08-22-2025 10:19:49 AM

View Replies

Latest Reply

BS_THE_ANALYST
Databricks Partner

08-23-2025 5:44:10 AM

7 kudos

Nice solutions! @ManojkMohan @WiliamRosa I love the use of the temp view for the intermediate result. The array_contains is also a really nice touch. @ManojkMohan when you write "SET VARIABLE ids = ARRAY('12483258','12483871','12483883');" ... can th...

7 kudos

08-23-2025 5:44:10 AM

8 More Replies

by Rainier_dw • Databricks Partner

08-25-2025 2:50:08 AM

2503 Views
6 replies
6 kudos

Resolved! Rollbacks/deletes on streaming table

Hi all — I’m running a Medallion streaming pipeline on Databricks using DLT (bronze → staging silver view → silver table). I ran into an issue and would appreciate any advice or best practices.What I’m doingIngesting streaming data into a streaming b...

Data Engineering

2503 Views
6 replies
6 kudos

08-25-2025 2:50:08 AM

View Replies

Latest Reply

dalcuovidiu
New Contributor III

08-25-2025 5:29:57 AM

6 kudos

I'm not entirely sure if I’m missing something here, but as far as I know there’s a golden rule in DWH applications: you never hard delete records, you use soft deletes instead. So I’m a bit puzzled why a hard delete is being used in this case.

6 kudos

08-25-2025 5:29:57 AM

5 More Replies

by ChingizK • New Contributor III

06-24-2024 10:39:11 AM

3395 Views
5 replies
2 kudos

Exclude a job from bundle deployment in PROD

My question is regarding Databricks Asset Bundles. I have defined a databricks.yml file the following way: bundle: name: my_bundle_name include: - resources/jobs/*.yml targets: dev: mode: development default: true workspace: ...

Data Engineering

3395 Views
5 replies
2 kudos

06-24-2024 10:39:11 AM

View Replies

Latest Reply

Coffee77
Honored Contributor II

08-25-2025 6:01:58 AM

2 kudos

Me too. No clean solution yet. As workaround I implemented first an "extra" control in specific jobs that never should be run in PROD to block execution based on environment variable in all clusters (I don't really like much but it was effective). As...

2 kudos

08-25-2025 6:01:58 AM

4 More Replies

by Datalight • Contributor

08-17-2025 10:51:06 AM

2066 Views
10 replies
3 kudos

Resolved! High Level Design for Transfer Data from One Databricks account to Another databricks account

Hi,May someone please help me with only Points which should be part of High Level Design and Low Level Design when transfering Data from One Databricks account to Another databricks account using Unity Catalog. First time full data transfer and than ...

Data Engineering

2066 Views
10 replies
3 kudos

08-17-2025 10:51:06 AM

View Replies

Latest Reply

Coffee77
Honored Contributor II

08-25-2025 5:26:29 AM

3 kudos

Based on my previous reply, you can use DEEP CLONE to clone data incrementally between workspaces by including it in a scheduled job but this will not work in real time indeed.

3 kudos

08-25-2025 5:26:29 AM

9 More Replies

by dalcuovidiu • New Contributor III

08-22-2025 2:36:37 AM

3445 Views
11 replies
10 kudos

DLT - SCD 2 - detect deletes

Hello,I have a question related to APPLY AS DELETE WHEN...If the source table does not have a column that specifies whether a record was deleted, I am currently using a workaround by ingesting synthetic data with a soft_deletion flag. In the future, ...

Data Engineering

3445 Views
11 replies
10 kudos

08-22-2025 2:36:37 AM

View Replies

Latest Reply

dalcuovidiu
New Contributor III

08-22-2025 6:16:56 AM

10 kudos

ok. In my case I am qualified for: Incremental without a delete flag (classic case)Generate synthetic tombstones via an anti-join between the current set of keys and the target’s active keys.I don't want to use Merge, that's why my question was for C...

10 kudos

08-22-2025 6:16:56 AM

10 More Replies

by BS_THE_ANALYST • Databricks Partner

08-24-2025 2:30:05 AM

3371 Views
2 replies
3 kudos

Resolved! Opinions/Thoughts: SQL Best Practices in Production .. DBT vs DLT ?

Hey everyone, I'd like to hear the experiences of the community on DLT (Lakeflow declarative pipelines) vs DBT. 1. Why would one choose one instead of the other? 2. How does picking one of these level up your SQL strategy?I am somebody who's well-ver...

Data Engineering

3371 Views
2 replies
3 kudos

08-24-2025 2:30:05 AM

View Replies

Latest Reply

BS_THE_ANALYST
Databricks Partner

08-24-2025 3:27:04 AM

3 kudos

That's a cracking write up @szymon_dybczak. Thanks for that .That's certainly given me some food for thought. I think the safest option here, at least for me, is digging into both of them. I feel better informed moving forward with this. I'd love to ...

3 kudos

08-24-2025 3:27:04 AM

1 More Replies

by SanneJansen564 • Contributor

08-16-2025 7:55:47 AM

1829 Views
11 replies
5 kudos

Ensuring Row Order When Importing CSV with COPY INTO

Hi everyone,I have a CSV file stored in S3, and it's critical for my process that the rows are loaded in the exact order they appear in the file.Does the COPY INTO command preserve the original row order during the load? I need to make sure the bronz...

Data Engineering

1829 Views
11 replies
5 kudos

08-16-2025 7:55:47 AM

View Replies

Latest Reply

WiliamRosa
Databricks Partner

08-23-2025 1:44:45 PM

5 kudos

tks @SanneJansen564

5 kudos

08-23-2025 1:44:45 PM

10 More Replies

by kenmyers-8451 • Contributor II

08-14-2025 2:29:56 PM

790 Views
1 replies
0 kudos

Suggestion: allow existing_cluster_id override from run_job_task and for_each_task

I'm not sure if this feature exists in newer versions of databricks CLI (doubtful because this doesn't seem possible in the UI either but if it helps my team has been on 0.222.0 for a while because it has been stable enough for us) and maybe this is ...

Data Engineering

790 Views
1 replies
0 kudos

08-14-2025 2:29:56 PM

View Replies

Latest Reply

WiliamRosa
Databricks Partner

08-23-2025 1:13:47 PM

0 kudos

Hi @kenmyers-8451, I’m also not sure if this feature exists in newer versions. Since no one else has replied, I’d suggest raising a ticket with the Databricks Support Team — they’ll be able to provide clarity on this topic:http://help.databricks.com/...

0 kudos

08-23-2025 1:13:47 PM

by Jothia • New Contributor III

08-06-2025 5:13:17 AM

702 Views
4 replies
1 kudos

spark excel reading custom cell

Hi all,some one help me to read excel with custom format cells _(* #,##0_);_(* (#,##0);_(* "-"??_);_(@_) from databricks using spark.excel read

Data Engineering

702 Views
4 replies
1 kudos

08-06-2025 5:13:17 AM

View Replies

Latest Reply

BS_THE_ANALYST
Databricks Partner

08-23-2025 11:21:26 AM

1 kudos

I don't think this should be too hard to handle with a python library @SebastianRowan. I'm happy to take a look if you could provide an example of where it doesn't work .All the best,BS

1 kudos

08-23-2025 11:21:26 AM

3 More Replies

by cnjrules • New Contributor III

04-27-2023 2:25:24 PM

5726 Views
4 replies
0 kudos

Resolved! Reference file name when using COPY INTO?

When using the COPY INTO statement is it possible to reference the current file name in the select staement? A generic example is shown below, hoping I can log the file name in the target table.COPY INTO my_table FROM (SELECT key, index, textData, ...

Data Engineering

5726 Views
4 replies
0 kudos

04-27-2023 2:25:24 PM

View Replies

Latest Reply

dalcuovidiu
New Contributor III

08-23-2025 9:02:53 AM

0 kudos

In sql, with a subselect works just fine %sql COPY INTO tabele_copy_into_test from ( SELECT user_id, email, _metadata.file_name AS source_filefrom "/Volumes/dbacademy_ecommerce/v01/raw/users-historical/")FILEFORMAT = PARQUETCOPY_OPTIONS ('m...

0 kudos

08-23-2025 9:02:53 AM

3 More Replies

Databricks Community

Forum Posts

cancel running job kill the parent process and does not wait for streamings to stop

How to preserve job run history when deploying with DABs

delta sharing presigned url was removed, what should I do?

Resolved! Pass the job even if specific task fails

Schema Evolution/Type Widening in Materialized Views

Resolved! Use Array in WHERE IN clause

Resolved! Rollbacks/deletes on streaming table

Exclude a job from bundle deployment in PROD

Resolved! High Level Design for Transfer Data from One Databricks account to Another databricks account

DLT - SCD 2 - detect deletes

Resolved! Opinions/Thoughts: SQL Best Practices in Production .. DBT vs DLT ?

Ensuring Row Order When Importing CSV with COPY INTO

Suggestion: allow existing_cluster_id override from run_job_task and for_each_task

spark excel reading custom cell

Resolved! Reference file name when using COPY INTO?

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template