Data Engineering

Forum Posts

Sorted by:

by ChrisLawford_n1 • Contributor

Monday

68 Views
1 replies
0 kudos

Network error on subsequent runs using serverless compute in DLT

Hello,When running on a serverless cluster in DLT our notebook first tries to install some python whls onto the cluster. We have noticed that when in development and running a pipeline many times over in a short space of time between runs that the pi...

Data Engineering

68 Views
1 replies
0 kudos

Monday

View Replies

Latest Reply

mark_ott
Databricks Employee

yesterday

0 kudos

The error you’re seeing (“Network is unreachable” repeated during pip installs) on a DLT (Delta Live Table) serverless cluster, especially after the first successful run, is a common issue that appears to affect Databricks pipelines run repeatedly on...

0 kudos

yesterday

by abhirupa7 • New Contributor

Tuesday

74 Views
2 replies
0 kudos

Resolved! Databricks Workflow

I have a query. I have multiple job (workflow)present in my workspace. Those job runs regularly. Multiple task present in those jobs. Few task having notebook that contain for each code in it. now when a job runs that particular task execute the for ...

Data Engineering

74 Views
2 replies
0 kudos

Tuesday

View Replies

Latest Reply

mark_ott
Databricks Employee

yesterday

0 kudos

To programmatically capture iteration-level information for tasks running inside a Databricks Workflow Job that uses the "for each" loop construct, you will primarily rely on the Databricks Jobs REST API (v2.1) and possibly the Databricks Python SDK....

0 kudos

yesterday

1 More Replies

by nefflev1 • New Contributor

Tuesday

71 Views
1 replies
1 kudos

VS Code Python file execution

Hi Everyone,I'm using the Databricks VS Code Extension to develop and deploy Asset Bundles. Usually we work with Notebooks and use the "Run File as Workflow" function. Now I'm trying to use pure python file for a new use case and tried to use the "Up...

Data Engineering

71 Views
1 replies
1 kudos

Tuesday

View Replies

Latest Reply

mark_ott
Databricks Employee

yesterday

1 kudos

You're encountering a common issue when using the Databricks VS Code Extension's "Upload and Run File" with pure Python files, especially in a secure, VNet-injected Azure Databricks deployment. Here’s a direct summary of what’s happening and how you ...

1 kudos

yesterday

by Akshay_Petkar • Valued Contributor

Tuesday

51 Views
2 replies
0 kudos

%run notebook fails in Job mode with Py4JJavaError (None.get), but works in interactive notebook

Hi everyone,I’m facing an issue when executing a Databricks job where my notebook uses %run to include other notebooks. I have a final notebook added as a task in a job, and inside that notebook I use %run to call another notebook that contains all ...

Data Engineering

51 Views
2 replies
0 kudos

Tuesday

View Replies

Latest Reply

mark_ott
Databricks Employee

yesterday

0 kudos

This issue with %run in Databricks notebooks—where everything works interactively in the UI, but fails in a job context with java.util.NoSuchElementException: None.get—is a relatively common pain point for users leveraging notebook modularization. Th...

0 kudos

yesterday

1 More Replies

by Sergecom • New Contributor III

Tuesday

51 Views
1 replies
1 kudos

Migrating from on-premises HDFS to Unity Catalog - Looking for advice on on-prem options

Hi,We’re currently running a Databricks installation with an on-premises HDFS file system. As we’re looking to adopt Unity Catalog, we’ve realized that our current HDFS setup has limited support and compatibility with Unity Catalog.Our requirement: W...

Data Engineering

51 Views
1 replies
1 kudos

Tuesday

View Replies

Latest Reply

mark_ott
Databricks Employee

yesterday

1 kudos

Unity Catalog does not natively support HDFS, and its primary design assumes cloud object storage (such as S3, ADLS, or GCS) as the backing store for both managed and external tables. For organizations restricted to on-premises storage, the situation...

1 kudos

yesterday

by Anoora • New Contributor II

yesterday

57 Views
2 replies
0 kudos

Scheduling and triggering jobs based on time and frequency precedence

I have a table in Databricks that stores job information, including fields such as job_name, job_id, frequency, scheduled_time, and last_run_time.I want to run a query every 10 minutes that checks this table and triggers a job if the scheduled_time i...

Data Engineering

data engineering

jobs

scheduling

57 Views
2 replies
0 kudos

yesterday

View Replies

Latest Reply

SamAdams
Contributor

yesterday

0 kudos

You could add a job with a scheduled based trigger that runs every 10 minutes. The task at the start of the job runs a SQL query against the job information table and uses the logic you described above to output a boolean value. Then feed that boolea...

0 kudos

yesterday

1 More Replies

by SamAdams • Contributor

yesterday

17 Views
0 replies
0 kudos

Time window for "All tables are updated" option in job Table Update Trigger

I've been using the Table Update Trigger for some SQL alert workflows. I have a job that uses 3 tables with an "All tables updated" trigger:Table 1 was updated at 07:20 UTCTable 2 was updated at 16:48 UTCTable 3 was updated at 16:50 UTC-> Job is trig...

Data Engineering

jobs

TableUpdateTrigger

17 Views
0 replies
0 kudos

yesterday

by aonurdemir • Contributor

Monday

55 Views
1 replies
1 kudos

Broken s3 file paths in File Notifications for auto loader

Suddenly at "2025-10-23T14:12:48.409+00:00", coming file paths from file notification queue started to be urlencoded. Hence, our pipeline gets file not found exception. I think something has changed suddenly and broke notification system. Here are th...

Data Engineering

55 Views
1 replies
1 kudos

Monday

View Replies

Latest Reply

K_Anudeep
Databricks Employee

yesterday

1 kudos

Hello @aonurdemir, Could you please re-run your pipeline now and check? This issue should be mitigated now. It is due to a recent internal bug that led to the unexpected handling of file paths with special characters. You should set ignoreMissingFile...

1 kudos

yesterday

by EricCournarie • New Contributor III

yesterday

35 Views
2 replies
1 kudos

Retrieving OBJECT values with the JDBC driver may lead to invalid JSON

Hello,Using the JDBC driver , I try to retrieve values in the ResultSet for a OBJECT type. Sadly, it returns invalid JSONGiven the SQLCREATE OR REPLACE TABLE main.eric.eric_complex_team (`id` INT,`nom` STRING,`infos` STRUCT<`age`: INT, `ville`: STRIN...

Data Engineering

35 Views
2 replies
1 kudos

yesterday

View Replies

Latest Reply

EricCournarie
New Contributor III

yesterday

1 kudos

Hello, thanks for the quick response .Sadly I do not have the hand on the SQL request , so no way for me to modify it ...

1 kudos

yesterday

1 More Replies

by DylanStout • Contributor

03-27-2025 6:48:33 AM

3039 Views
1 replies
0 kudos

Pyspark ML tools

Cluster policies not letting us use Pyspark ML toolsIssue details: We have clusters available in our Databricks environment and our plan was to use functions and classes from "pyspark.ml" to process data and train our model in parallel across cores/n...

Data Engineering

3039 Views
1 replies
0 kudos

03-27-2025 6:48:33 AM

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

yesterday

0 kudos

Hey @DylanStout , Thanks for laying out the symptoms clearly—this is a classic clash between Safe Spark (shared/high-concurrency) protections and multi-threaded/driver-mutating code paths. What’s happening On clusters with the Shared/Safe Spark a...

0 kudos

yesterday

by AvneeshSingh • New Contributor

02-05-2025 11:27:29 PM

3232 Views
1 replies
1 kudos

Autloader Data Reprocess

Hi ,If possible can any please help me with some autloader options I have 2 open queries ,(i) Let assume I am running some autoloader stream and if my job fails, so instead of resetting the whole checkpoint, I want to run stream from specified timest...

Data Engineering

autoloader

3232 Views
1 replies
1 kudos

02-05-2025 11:27:29 PM

View Replies

Latest Reply

AbhaySingh
Databricks Employee

yesterday

1 kudos

Have you reviewed following doc already? Please let me know specifics and we can go from there but i'd start with following doc. https://docs.databricks.com/aws/en/ingestion/cloud-object-storage/auto-loader/options

1 kudos

yesterday

by dhruvs2 • New Contributor

Monday

108 Views
3 replies
4 kudos

How to trigger a Databricks job only after multiple other jobs have completed

We have a use case where Job C should start only after both Job A and Job B have successfully completed.In Airflow, we achieve this using an ExternalTaskSensor to set dependencies across different DAGs.Is there a way to configure something similar in...

Data Engineering

108 Views
3 replies
4 kudos

Monday

View Replies

Latest Reply

BS_THE_ANALYST
Esteemed Contributor II

yesterday

4 kudos

Hi @dhruvs2 .A Lakeflow Job consists of tasks. The tasks can be things like notebooks or other jobs. If you want to orchestrate many jobs, I'd agree that having a job to do this is your best bet . Then you can setup the dependencies as you require.I...

4 kudos

yesterday

2 More Replies

by akeel-rehman • New Contributor

03-20-2025 7:05:10 AM

2889 Views
1 replies
0 kudos

Best Practices for Reusable Workflows & Cluster Management Across Repos.

Hi everyone,I am looking for best practices around reusable workflows in Databricks, particularly in these areas:Reusable Workflows Instead of Repetition: How can we define reusable workflows rather than repeating the same steps across multiple jobs?...

Data Engineering

2889 Views
1 replies
0 kudos

03-20-2025 7:05:10 AM

View Replies

Latest Reply

AbhaySingh
Databricks Employee

yesterday

0 kudos

Here are my recommendations: 1. Databricks Asset Bundles (DABs) for reusable workflows 2. API-based triggering and Run Job Tasks for cross-repo workflows 3. Instance Pools as the #1 game-changer for cluster optimization (5-10 seconds vs 5-10 minu...

0 kudos

yesterday

by jorperort • Contributor

Tuesday

111 Views
5 replies
4 kudos

Resolved! Spark JDBC Write Fails for Record Not Present - PK error

Good afternoon everyone,I’m writing this post to see if anyone has encountered this problem and if there is a way to resolve it or understand why it happens. I’m working in a Databricks Runtime 15.4 LTS environment, which includes Apache Spark 3.5.0 ...

Data Engineering

111 Views
5 replies
4 kudos

Tuesday

View Replies

Latest Reply

ManojkMohan
Honored Contributor

Tuesday

4 kudos

@jorperort When writing to SQL Server tables with composite primary keys from Databricks using JDBC, unique constraint violations are often caused by Spark’s distributed retry logic https://docs.databricks.com/aws/en/archive/connectors/jdbcSolution...

4 kudos

Tuesday

4 More Replies

by Dimitry • Contributor III

Monday

78 Views
4 replies
0 kudos

databricks notebook parameter works in interactive mode but not in the job

Hi guys I've added a parameter "files_mask " to a notebook, with a default value.The job running this notebook broke with error: com.databricks.dbutils_v1.InputWidgetNotDefined: No input widget named files_mask is definedCode: mask = dbutils.widgets....

Data Engineering

78 Views
4 replies
0 kudos

Monday

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

Monday

0 kudos

Hi @Dimitry ,Do you use python or scala in your notebook?

0 kudos

Monday

3 More Replies

Databricks Community

Forum Posts

Network error on subsequent runs using serverless compute in DLT

Resolved! Databricks Workflow

VS Code Python file execution

%run notebook fails in Job mode with Py4JJavaError (None.get), but works in interactive notebook

Migrating from on-premises HDFS to Unity Catalog - Looking for advice on on-prem options

Scheduling and triggering jobs based on time and frequency precedence

Time window for "All tables are updated" option in job Table Update Trigger

Broken s3 file paths in File Notifications for auto loader

Retrieving OBJECT values with the JDBC driver may lead to invalid JSON

Pyspark ML tools

Autloader Data Reprocess

How to trigger a Databricks job only after multiple other jobs have completed

Best Practices for Reusable Workflows & Cluster Management Across Repos.

Resolved! Spark JDBC Write Fails for Record Not Present - PK error

databricks notebook parameter works in interactive mode but not in the job

Join Us as a Local Community Builder!

Issue with Root Folder Configuration in Databricks...

I can't use my own .whl package in Databricks app ...

Comparing Databricks Serverless Warehouse with Sno...

Geometry Type not converted into proper binary for...

Results from the spark application to driver