Data Engineering

Forum Posts

Sorted by:

by aonurdemir • Contributor

Monday

48 Views
1 replies
1 kudos

Broken s3 file paths in File Notifications for auto loader

Suddenly at "2025-10-23T14:12:48.409+00:00", coming file paths from file notification queue started to be urlencoded. Hence, our pipeline gets file not found exception. I think something has changed suddenly and broke notification system. Here are th...

Data Engineering

48 Views
1 replies
1 kudos

Monday

View Replies

Latest Reply

K_Anudeep
Databricks Employee

8 hours ago

1 kudos

Hello @aonurdemir, Could you please re-run your pipeline now and check? This issue should be mitigated now. It is due to a recent internal bug that led to the unexpected handling of file paths with special characters. You should set ignoreMissingFile...

1 kudos

8 hours ago

by EricCournarie • New Contributor III

12 hours ago

21 Views
2 replies
0 kudos

Retrieving OBJECT values with the JDBC driver may lead to invalid JSON

Hello,Using the JDBC driver , I try to retrieve values in the ResultSet for a OBJECT type. Sadly, it returns invalid JSONGiven the SQLCREATE OR REPLACE TABLE main.eric.eric_complex_team (`id` INT,`nom` STRING,`infos` STRUCT<`age`: INT, `ville`: STRIN...

Data Engineering

21 Views
2 replies
0 kudos

12 hours ago

View Replies

Latest Reply

EricCournarie
New Contributor III

9 hours ago

0 kudos

Hello, thanks for the quick response .Sadly I do not have the hand on the SQL request , so no way for me to modify it ...

0 kudos

9 hours ago

1 More Replies

by DylanStout • Contributor

03-27-2025 6:48:33 AM

3029 Views
1 replies
0 kudos

Pyspark ML tools

Cluster policies not letting us use Pyspark ML toolsIssue details: We have clusters available in our Databricks environment and our plan was to use functions and classes from "pyspark.ml" to process data and train our model in parallel across cores/n...

Data Engineering

3029 Views
1 replies
0 kudos

03-27-2025 6:48:33 AM

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

15 hours ago

0 kudos

Hey @DylanStout , Thanks for laying out the symptoms clearly—this is a classic clash between Safe Spark (shared/high-concurrency) protections and multi-threaded/driver-mutating code paths. What’s happening On clusters with the Shared/Safe Spark a...

0 kudos

15 hours ago

by AvneeshSingh • New Contributor

02-05-2025 11:27:29 PM

3224 Views
1 replies
0 kudos

Autloader Data Reprocess

Hi ,If possible can any please help me with some autloader options I have 2 open queries ,(i) Let assume I am running some autoloader stream and if my job fails, so instead of resetting the whole checkpoint, I want to run stream from specified timest...

Data Engineering

autoloader

3224 Views
1 replies
0 kudos

02-05-2025 11:27:29 PM

View Replies

Latest Reply

AbhaySingh
Databricks Employee

15 hours ago

0 kudos

Have you reviewed following doc already? Please let me know specifics and we can go from there but i'd start with following doc. https://docs.databricks.com/aws/en/ingestion/cloud-object-storage/auto-loader/options

0 kudos

15 hours ago

by dhruvs2 • New Contributor

Monday

75 Views
3 replies
4 kudos

How to trigger a Databricks job only after multiple other jobs have completed

We have a use case where Job C should start only after both Job A and Job B have successfully completed.In Airflow, we achieve this using an ExternalTaskSensor to set dependencies across different DAGs.Is there a way to configure something similar in...

Data Engineering

75 Views
3 replies
4 kudos

Monday

View Replies

Latest Reply

BS_THE_ANALYST
Esteemed Contributor II

15 hours ago

4 kudos

Hi @dhruvs2 .A Lakeflow Job consists of tasks. The tasks can be things like notebooks or other jobs. If you want to orchestrate many jobs, I'd agree that having a job to do this is your best bet . Then you can setup the dependencies as you require.I...

4 kudos

15 hours ago

2 More Replies

by Adam_Borlase • New Contributor III

yesterday

77 Views
3 replies
1 kudos

Error trying to edit Job Cluster via Databricks CLI

Good Day all,After having issues with Cloud resources allocated to Lakeflow jobs and Gateways I am trying to apply a policy to the cluster that is allocated to the Job. I am very new to a lot of the databricks platform and the administration so all h...

Data Engineering

77 Views
3 replies
1 kudos

yesterday

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

16 hours ago

1 kudos

@Adam_Borlase , Thanks, this is helpful context. The key is that the SQL Server connector’s ingestion pipeline runs on serverless, while the ingestion “gateway” runs on classic compute in your cloud account, so vCPU family quotas can block gateway c...

1 kudos

16 hours ago

2 More Replies

by akeel-rehman • New Contributor

03-20-2025 7:05:10 AM

2885 Views
1 replies
0 kudos

Best Practices for Reusable Workflows & Cluster Management Across Repos.

Hi everyone,I am looking for best practices around reusable workflows in Databricks, particularly in these areas:Reusable Workflows Instead of Repetition: How can we define reusable workflows rather than repeating the same steps across multiple jobs?...

Data Engineering

2885 Views
1 replies
0 kudos

03-20-2025 7:05:10 AM

View Replies

Latest Reply

AbhaySingh
Databricks Employee

16 hours ago

0 kudos

Here are my recommendations: 1. Databricks Asset Bundles (DABs) for reusable workflows 2. API-based triggering and Run Job Tasks for cross-repo workflows 3. Instance Pools as the #1 game-changer for cluster optimization (5-10 seconds vs 5-10 minu...

0 kudos

16 hours ago

by jorperort • Contributor

yesterday

99 Views
5 replies
4 kudos

Resolved! Spark JDBC Write Fails for Record Not Present - PK error

Good afternoon everyone,I’m writing this post to see if anyone has encountered this problem and if there is a way to resolve it or understand why it happens. I’m working in a Databricks Runtime 15.4 LTS environment, which includes Apache Spark 3.5.0 ...

Data Engineering

99 Views
5 replies
4 kudos

yesterday

View Replies

Latest Reply

ManojkMohan
Honored Contributor

yesterday

4 kudos

@jorperort When writing to SQL Server tables with composite primary keys from Databricks using JDBC, unique constraint violations are often caused by Spark’s distributed retry logic https://docs.databricks.com/aws/en/archive/connectors/jdbcSolution...

4 kudos

yesterday

4 More Replies

by Dimitry • Contributor III

Monday

64 Views
4 replies
0 kudos

databricks notebook parameter works in interactive mode but not in the job

Hi guys I've added a parameter "files_mask " to a notebook, with a default value.The job running this notebook broke with error: com.databricks.dbutils_v1.InputWidgetNotDefined: No input widget named files_mask is definedCode: mask = dbutils.widgets....

Data Engineering

64 Views
4 replies
0 kudos

Monday

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

Monday

0 kudos

Hi @Dimitry ,Do you use python or scala in your notebook?

0 kudos

Monday

3 More Replies

by janm2 • New Contributor II

07-01-2025 5:27:11 AM

1172 Views
5 replies
1 kudos

Autoloader cleansource option does not take any effect

Hello everyone,I was very keen to try out the Autoloader's new cleanSource option so we can clean up our landing folder easily.However I found out it does not have any effect whatsoever. As I cannot create a support case I am creating this post.A sim...

Data Engineering

1172 Views
5 replies
1 kudos

07-01-2025 5:27:11 AM

View Replies

Latest Reply

SanthoshU
New Contributor II

yesterday

1 kudos

Any Solution ?

1 kudos

yesterday

4 More Replies

by ashraf1395 • Honored Contributor

02-10-2025 11:03:00 PM

2764 Views
4 replies
1 kudos

Resolved! How to capture dlt pipeline id / name using dynamic value reference

Hi there,I have a usecase where I want to set the dlt pipeline id in the configuration parameters of that dlt pipeline.The way we can use workspace ids or task id in notebook task task_id = {{task.id}}/ {{task.name}} and can save them as parameters a...

Data Engineering

2764 Views
4 replies
1 kudos

02-10-2025 11:03:00 PM

View Replies

Latest Reply

CaptainJack
New Contributor III

yesterday

1 kudos

Did someone was able to get pipeline_id programaticaly?

1 kudos

yesterday

3 More Replies

by shadowinc • New Contributor III

03-03-2025 6:27:06 AM

3274 Views
1 replies
0 kudos

Call SQL Function via API

Background - I created a SQL function with the name schema.function_name, which returns a table, in a notebook, the function works perfectly, however, I want to execute it via API using SQL Endpoint. In API, I got insufficient privileges error, so gr...

Data Engineering

3274 Views
1 replies
0 kudos

03-03-2025 6:27:06 AM

View Replies

Latest Reply

AbhaySingh
Databricks Employee

yesterday

0 kudos

Do you know if API service principal / user has USAGE on the database itself? This seems like the most likely issue based on information on the question. Quick Fix Checklist: Run these commands in order (replace api_user with the actual user from ...

0 kudos

yesterday

by Pw76 • New Contributor II

06-24-2025 10:10:07 AM

2287 Views
4 replies
1 kudos

CDC with Snapshot - next_snapshot_and_version() function

I am trying to use create_auto_cdc_from_snapshot_flow (formerly apply_changes_from_snapshot()) (see: https://docs.databricks.com/aws/en/dlt/cdc#cdc-from-snapshot)I am attempting to do SCD type 2 changes using historic snapshot data.In the first coup...

Data Engineering

CDC

dlt

Snapshot

2287 Views
4 replies
1 kudos

06-24-2025 10:10:07 AM

View Replies

Latest Reply

fabdsp
New Contributor

yesterday

1 kudos

I have the same issue - very big limitation of create_auto_cdc_from_snapshot_flow and no solution

1 kudos

yesterday

3 More Replies

by jeremy98 • Honored Contributor

04-03-2025 12:52:11 AM

854 Views
3 replies
0 kudos

how to pass secrets keys using a spark_python_task

Hello community,I was searching a way to pass secrets to spark_python_task. Using a notebook file is easy, it's only to use dbutils.secrets.get(...) but how to do the same thing using a spark_python_task set using serveless compute?Kind regards,

Data Engineering

854 Views
3 replies
0 kudos

04-03-2025 12:52:11 AM

View Replies

Latest Reply

analytics_eng
New Contributor III

yesterday

0 kudos

@Renu_ but passing them as spark_env will not work with serverless I guess? See also the limitations on the docs Serverless compute limitations | Databricks on AWS

0 kudos

yesterday

2 More Replies

by dpc • Contributor

Saturday

146 Views
5 replies
3 kudos

Resolved! Pass parameters between jobs

Hello I have jobIn that job, it runs a task (GetGid) that executes a notebook and obtains some value using dbutils.jobs.taskValuesSete.g. dbutils.jobs.taskValuesSet(key = "gid", value = gid)As a result, I can use this and pass it to another task for ...

Data Engineering

146 Views
5 replies
3 kudos

Saturday

View Replies

Latest Reply

dpc
Contributor

Monday

3 kudos

Thanks @Hubert-Dudek and @ilir_nuredini I see this nowI'm setting using:dbutils.jobs.taskValues.Set()passing to the job task using Key - gid; Value - {{tasks.GetGid.values.gid}}Then reading using: pid = dbutils.widgets.get()

3 kudos

Monday

4 More Replies

Databricks Community

Forum Posts

Broken s3 file paths in File Notifications for auto loader

Retrieving OBJECT values with the JDBC driver may lead to invalid JSON

Pyspark ML tools

Autloader Data Reprocess

How to trigger a Databricks job only after multiple other jobs have completed

Error trying to edit Job Cluster via Databricks CLI

Best Practices for Reusable Workflows & Cluster Management Across Repos.

Resolved! Spark JDBC Write Fails for Record Not Present - PK error

databricks notebook parameter works in interactive mode but not in the job

Autoloader cleansource option does not take any effect

Resolved! How to capture dlt pipeline id / name using dynamic value reference

Call SQL Function via API

CDC with Snapshot - next_snapshot_and_version() function

how to pass secrets keys using a spark_python_task

Resolved! Pass parameters between jobs

Join Us as a Local Community Builder!

Unable to login to community edition

Learning Path for Spark Developer Associate

DLT Pipeline Stopped working

Migrating Talend ETL Jobs to Databricks – Best Pra...

Conversational Agent App integration with genie in...