cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Jobs Data Pipeline Runtime Increase Significantly

iptkrisna
New Contributor III

Hi,

I am facing an issue where one of my jobs taking so long since certain time, previously its only needs less than 1 hour to run a batch job that load json data and do a truncate and load to a delta table, but since june 2nd, it become so long that it takes more than 2 hours (even 3 sometime) until its done.

I'm just curious how it can happen because I've not changes anything to the code and the data is just increasing around 5% per day. One of the thing that I suspect is maybe because the amount of column (90+) that I have on that data does not fit to columnar approach of delta table? CMIIW.

I've attach the images, first 2 is the typical previous runtime before june 2nd, last 2 is the typical current runtime since june 2nd.

Please let me know if you have any idea ya.

Thank you!

1 REPLY 1

Anonymous
Not applicable

Hi @krisna math​ 

 Great to meet you, and thanks for your question!

Let's see if your peers in the community have an answer to your question. Thanks.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.