cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

DLT Pipeline Error Handling

dashawn
New Contributor

Hello all.

We are a new team implementing DLT and have setup a number of tables in a pipeline loading from s3 with UC as the target. I'm noticing that if any of the 20 or so tables fail to load, the entire pipeline fails even when there are no dependencies between the tables. In our case, a new table was added to the DLT notebook but the source s3 directory is empty. This has caused the pipeline to fail with error "org.apache.spark.sql.catalyst.ExtendedAnalysisException: Unable to process statement for Table 'table_name'.

Is there a way to change this behavior in the pipeline configuration so that one table failing doesn't impact the rest of the pipeline?

2 REPLIES 2

Kaniz
Community Manager
Community Manager

Hi @dashawn

  • When data processing fails, manual investigation of logs to understand the failures, data cleanup, and determining the restart point can be time-consuming and costly. DLT provides features to handle errors more intelligently.
  • By default, if any table fails to load, the entire pipeline fails. However, you can customize this behaviour to allow other tables to continue processing even if one table encounters an error.

jose_gonzalez
Moderator
Moderator

Thank you for sharing this @Kaniz@dashawn did you were able to check Kaniz's docs? do you still need help or shall you accept Kaniz's solution? 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.