cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

DLT Autoloader stuck in reading Avro files from Azure blob storage

kurokaj
New Contributor

I have a DLT pipeline joining data from streaming tables to metadata of Avro files located in Azure blob storage. The avro files are loaded using autoloader. Up until 25.3. (about 20:00UTC) the pipeline worked fine, but then suddenly got stuck in initializing the avro stream. From the compute metrics its clear that the read process is stuck in some sort of loop. 

This issue is replicable also in basic notebook when running autoloader or batch loader on any number of avro files. The runs do not throw any errors, which makes the issue quite puzzling to solve. 

We have also noticed that when running autoloader/batch loader on notebook with runtime 13.3LTS, the process is successful. However if used 14.3LTS or 14.1 the process gets stuck. Any advice on the situation is appreciated. 

image.png

(SC of the frozen process.)

 

1 REPLY 1

cgrant
Databricks Employee
Databricks Employee

Based off of your screenshot, a Spark job has started, and 33/34 tasks are completed. This is usually indicative of some kind of skewed processing. Please refer to this documentation for help identifying and resolving skew

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group