cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Malthe
by Contributor III
  • 1536 Views
  • 1 replies
  • 2 kudos

Driver terminated abnormally due to FORCE_KILL

We have a job running on a job cluster where sometimes the driver dies:> The spark driver has stopped unexpectedly and is restarting. Your notebook will be automatically reattached.But the metrics don't suggest an explanation for this situation.In th...

  • 1536 Views
  • 1 replies
  • 2 kudos
Latest Reply
cgrant
Databricks Employee
  • 2 kudos

That error is usually related to driver load. Try upsizing the driver one size and see if it still happens. Otherwise, for troubleshooting, driver problems are surfaced to the cluster's event log, like DRIVER_NOT_RESPONDING and DRIVER_UNAVAILABLE. Yo...

  • 2 kudos
nopal1
by New Contributor II
  • 728 Views
  • 2 replies
  • 2 kudos

Resolved! Python os.listdir() behavior difference between 15.4LTS and 16.4LTS DBRs

We found that when using os.listdir() in Databricks notebooks to list files stored in the Workspace (i.e., alongside the notebook, not in DBFS), file extensions were missing in Databricks Runtime 14.3 LTS and 15.4 LTS, but appeared correctly in 16.4 ...

  • 728 Views
  • 2 replies
  • 2 kudos
Latest Reply
cgrant
Databricks Employee
  • 2 kudos

This is expected and changed in DBR16.2: In Databricks Runtime 16.2 and above, notebooks are supported as workspace files. 

  • 2 kudos
1 More Replies
r_g_s_cn
by New Contributor II
  • 932 Views
  • 2 replies
  • 0 kudos

Databricks Workflow Automatically Marked as Failed When Autoloader Stream Fails in a Task

Issue: I want my Databricks Task/Workflow, which is running a pytest test, to not be automatically marked as "Failed" when an Autoloader stream shuts down due to an issue. It seems that if an Autoloader / Structured Streaming stream fails, it will au...

  • 932 Views
  • 2 replies
  • 0 kudos
Latest Reply
SP_6721
Honored Contributor II
  • 0 kudos

Hi @r_g_s_cn ,When a streaming query (like Auto Loader) fails in Databricks, especially due to a schema mismatch, the job or task is automatically marked as FAILED, even if you catch the exception in your code. That’s because the failure is detected ...

  • 0 kudos
1 More Replies
ChristianRRL
by Honored Contributor
  • 694 Views
  • 2 replies
  • 1 kudos

Resolved! Autoloader Functionality Question: Pull API data directly?

Hi there, when referencing Common data loading patterns > Enable flexible semi-structured data pipelines , I noticed this interesting code snippet:spark.readStream.format("cloudFiles") \ .option("cloudFiles.format", "json") \ # will ensure that t...

  • 694 Views
  • 2 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @ChristianRRL ,Unfortunately, they chose quite confusing name. Autloader only supports one type of source -> cloudFiles.And cloudFiles is nothing but your cloud object storage. So in this example they have a datalake directory /api/request where t...

  • 1 kudos
1 More Replies
ChristianRRL
by Honored Contributor
  • 835 Views
  • 4 replies
  • 1 kudos

Autoloader Console Output Issue

In reference prior post: Re: Autoloader Error Loading and Displaying - Databricks Community - 122579I am attempting to output results to the console (notebook cell), but am not seeing anything (other than the dataframe schema). Is this expected? I am...

ChristianRRL_2-1754599656677.png
  • 835 Views
  • 4 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @ChristianRRL ,Did you run this code before? Maybe all your source files has been already written to checkpoint. Try to upload new json file and run it again. Also, you can check drivers logs. Sometimes you can find them error messages. 

  • 1 kudos
3 More Replies
yit
by Contributor III
  • 679 Views
  • 1 replies
  • 1 kudos

Resolved! Considering Autoloader for Bronze to Silver transformations

I’m currently implementing Auto Loader to ingest data from the source into the Bronze layer—essentially mapping the raw data into Delta tables. Now, I’ve also been considering using Auto Loader for Bronze-to-Silver transformations.Are there any pros ...

  • 679 Views
  • 1 replies
  • 1 kudos
Latest Reply
cgrant
Databricks Employee
  • 1 kudos

Auto Loader is for loading raw files, not loading Delta Lake or Apache Iceberg tables, see more here. Instead, stream from a Delta Lake table.

  • 1 kudos
joggiri
by New Contributor II
  • 972 Views
  • 1 replies
  • 1 kudos

PySpark Lazy Evaluation

PySpark Lazy Evaluation - Why does my logging function seem to execute without an explicit action in Databricks?Hello everyone,I was scrolling and found some Medium post on a PySpark (https://medium.com/@sudeepwrites/pyspark-secrets-no-one-talks-abou...

  • 972 Views
  • 1 replies
  • 1 kudos
Latest Reply
cgrant
Databricks Employee
  • 1 kudos

I don't have full access to that article, but here's something that might help clarify things! While Spark uses lazy evaluation (meaning it waits to execute until absolutely necessary), Python works with eager evaluation. This means that when you ru...

  • 1 kudos
EAnthemNHC1
by New Contributor III
  • 1173 Views
  • 2 replies
  • 0 kudos

Resolved! Streaming Failure on Full Refresh Tables while using Serverless

On the afternoon of the 2025-07-30 my team began to experience issues with pipeline tasks that were set to full refresh and full refresh only. These pipelines were defined to use serverless, and the only way we were able to get them back online was t...

  • 1173 Views
  • 2 replies
  • 0 kudos
Latest Reply
EAnthemNHC1
New Contributor III
  • 0 kudos

Thanks for the reply - after consulting with our Databricks rep we determined it was a bug released by Databricks with a recent update to serverless. The Databricks team has resolved the issue and we have switched back to serverless. 

  • 0 kudos
1 More Replies
yit
by Contributor III
  • 653 Views
  • 2 replies
  • 4 kudos

Resolved! Autoloader fails when creating external Delta table in same notebook

Hi everyone,I’ve set up Databricks Autoloader to ingest data from ADLS into a Delta table. The table is defined as an external Delta table, with its location pointing to a path in ADLS.Here’s the flow I’m using:On the first run for a given data sourc...

  • 653 Views
  • 2 replies
  • 4 kudos
Latest Reply
yit
Contributor III
  • 4 kudos

Thank you for your response!I've tried something similar, added time.sleep(10) between table creation and autoloader initialization, but it did not work.What worked was separating the table creation and the autoloader initialization into different ce...

  • 4 kudos
1 More Replies
Kutbuddin
by New Contributor III
  • 1048 Views
  • 1 replies
  • 0 kudos

DBT jobs fail due to SQL warehouse stopping mid execution

+ dbt seed 15:49:17 Running with dbt=1.10.6 15:49:19 Registered adapter: databricks=1.10.8 15:49:20 Unable to do partial parsing because saved manifest not found. Starting full parse. 15:49:25 Found 27 models, 73 data tests, 1 seed, 7 sources, 68...

  • 1048 Views
  • 1 replies
  • 0 kudos
Latest Reply
SP_6721
Honored Contributor II
  • 0 kudos

Hi @Kutbuddin ,Looks like Databricks SQL Warehouse downtime. Are you still seeing failures on reruns or was it just on that day?

  • 0 kudos
ManojkMohan
by Honored Contributor II
  • 515 Views
  • 2 replies
  • 1 kudos

Resolved! ML Specific computes in data bricks free edition

Given free edition data bricks has serverless compute only is there any work around to chose ML Specific computes like belowis paying for it the only option ?

ManojkMohan_0-1754653497247.png
  • 515 Views
  • 2 replies
  • 1 kudos
Latest Reply
FedeRaimondi
Contributor II
  • 1 kudos

Hi @ManojkMohan , as part of Databricks Free Edition you have access to serverless compute resources only.Databricks Runtime for Machine Learning and Apache Spark MLlib are not supported.Resources:Databricks Free Edition limitations | Databricks Docu...

  • 1 kudos
1 More Replies
dawn-dot-py
by New Contributor II
  • 481 Views
  • 1 replies
  • 1 kudos

Resolved! Testing Databricks Auto Loader File Notification (File Event) in Public Preview - Spark Termination

I tried to test the Databricks Auto Loader file notification (file event) feature, which is currently in public preview, using a notebook for work purposes. However, when I ran display(df), Spark terminated and threw the error shown in the attached i...

dawndotpy_0-1754542620496.png
  • 481 Views
  • 1 replies
  • 1 kudos
Latest Reply
Advika
Community Manager
  • 1 kudos

Hello @dawn-dot-py! Auto Loader’s managed file events are indeed in Public Preview, but they’re available to allowlisted workspaces. The error you encountered means your workspace hasn’t been enrolled in the preview, which is expected unless you’ve b...

  • 1 kudos
Ovasheli
by New Contributor
  • 1492 Views
  • 1 replies
  • 1 kudos

DLT Incrimental Load And Metadata Capture

Hello,I'm building a Delta Live Tables (DLT) pipeline to load data from a cloud source into an on-premise warehouse. My source tables have Change Data Feed (CDF) enabled, and my pipeline code is complex, involving joins of multiple Slowly Changing Di...

  • 1492 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @Ovasheli ,The thing is with Declarative Pipelines (former DLT) you can't always force incremental load. For example, if you're using materialized views in your pipeline there is an optimizer called Enzyme that can selectively incrementally load m...

  • 1 kudos
Nick_Pacey
by New Contributor III
  • 4721 Views
  • 5 replies
  • 1 kudos

Resolved! Connecting to a On Prem SQL Server Instance using JDBC

Hi,We are trying to connect to an On-prem SQL Server instances using JDBC (we really want to use a Federated connection, testing JDBC first).  We have successfully done this for one of the SQL Servers we have, but cannot get it to work for the other....

Data Engineering
Lakehouse
sql
  • 4721 Views
  • 5 replies
  • 1 kudos
Latest Reply
ajinaniyan
New Contributor II
  • 1 kudos

The main difference was that the failing one was a named instance (hostname\instancename) instead of just hostname. After trying different connection string variations and confirming traffic hit the server through the firewall, we found the root caus...

  • 1 kudos
4 More Replies
Pratikmsbsvm
by Contributor
  • 940 Views
  • 1 replies
  • 1 kudos

Resolved! Create A data Pipeline between 2 databricks Instance one using unity catalog the Other Databrick not

I have 2 Databricks Instance. Databricks A and Databricks B.The Application Hightouch is consuming Data from Databricks B through Unity Catalog.I have to create a Data pipeline to push data from Databricks A to B Without using Delta sharing.Diagram: ...

Pratikmsbsvm_1-1754562995401.png
  • 940 Views
  • 1 replies
  • 1 kudos
Latest Reply
Stefan-Koch
Valued Contributor II
  • 1 kudos

Hi @Pratikmsbsvm You could use Lakehouse Federation. Attach 1 instance to another instance as a connection. https://docs.databricks.com/aws/en/query-federation/databricks

  • 1 kudos
Labels