cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Issue while re-starting Streaming DLT pipeline - PUBSUB

Ajay-Pandey
Esteemed Contributor III

I am not able to restart our streaming DLT pipeline as it's giving "__tmp_path_dir" number format error, I am using PUB-SUB as a source for streaming.

Any solutions will help -

 

fff.jpg

#pusub #databricks

 

1 ACCEPTED SOLUTION

Accepted Solutions

Ajay-Pandey
Esteemed Contributor III
5 REPLIES 5

Amine
New Contributor III
New Contributor III

Hello,

It seems there is an exception within the pipeline code.

Can you provide the partial source code of the pipeline?

Ajay-Pandey
Esteemed Contributor III

Hi @Amine ,

I think there is no issue with the code but still am attaching the snapshot of the code below- 

hjhj.jpg

 

Kaniz
Community Manager
Community Manager

Hi @Ajay-PandeyThe "__tmp_path_dir" number format error you're encountering might be due to a stale disk cache or missing underlying files. You can try to resolve this issue by performing the following steps:

1. Invalidate Disk Cache: You can invalidate the disk cache manually by restarting the cluster. This will refresh the stale disk cache and might solve your issue.

2. Check for Missing Files: The error might be due to missing files in the streaming source checkpoint directory. This directory contains important default options for the stream; if files are missing, the stream might not restart. Verify if all necessary files are present.

3. Check for Schema Changes: If the schema of your source data has changed, you might need to enable schema inference with Auto Loader. This will automatically restart your stream when the schema of your source data changes. 

Ajay-Pandey
Esteemed Contributor III

Hi @Kaniz ,

I am using DLT to fetch data from PUBSUB streaming, issue is not related to the checkpoint as DLT automatically maintains all the checkpoints or invalid disk cache as I restart the DLT pipeline that creates a new cluster every time. Also, I have checked and found no schema changes in my data.

I am running the same code on normal cluster with PUBSUB streaming and it's working fine there the issue is only when I am running it through DLT.

 

Ajay-Pandey
Esteemed Contributor III

temp_s1.jpg

 

temp_s2.jpg

@Kaniz @Amine 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.