cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

DLT Autoloader stuck in reading Avro files from Azure blob storage

kurokaj
New Contributor

I have a DLT pipeline joining data from streaming tables to metadata of Avro files located in Azure blob storage. The avro files are loaded using autoloader. Up until 25.3. (about 20:00UTC) the pipeline worked fine, but then suddenly got stuck in initializing the avro stream. From the compute metrics its clear that the read process is stuck in some sort of loop. 

This issue is replicable also in basic notebook when running autoloader or batch loader on any number of avro files. The runs do not throw any errors, which makes the issue quite puzzling to solve. 

We have also noticed that when running autoloader/batch loader on notebook with runtime 13.3LTS, the process is successful. However if used 14.3LTS or 14.1 the process gets stuck. Any advice on the situation is appreciated. 

image.png

(SC of the frozen process.)

 

1 REPLY 1

Kaniz_Fatma
Community Manager
Community Manager

Hi @kurokaj

  • If the schema of the input data changes while an update is running, the update may be logged as CANC...1. Ensure that there haven’t been any unexpected schema changes in your Avro files during the problematic period.
  • Consider monitoring schema changes and handling them appropriately within your pipeline.
  • Verify that your DLT pipeline components (including the autoloader) are compatible with the specific runtime versions you’re using. Check for any known issues or updates related to these versions.
  • Since the pipeline isn’t throwing errors, it’s essential to gather additional diagnostic information.
  • Review the compute metrics to understand where the read process is getting stuck. Look for any unusual patterns or bottlenecks.
  • Consider enabling detailed logging or debugging options to capture more information during the initialization process.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group