cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Garbage Collection on AutoLoader

nolanlavender00
New Contributor

Once a week, I get very long run times with AutoLoader. The spark job says it is done, but garbage collection keeps rising on the driver. I assume this is because of the backfill interval that I am using with FileNotification Type. I have this set to do so every week.

Is my suspicion right? If so how could I handle this and is the Backfill Interval necessary?

2 REPLIES 2

Kaniz_Fatma
Community Manager
Community Manager

Hi @nolanlavender008โ€‹, You can reduce the backfill interval to a more frequent interval allowing timely processing of incoming files. You could experiment with different intervals to balance processing efficiency and resource utilization.

Another approach could be to implement a mechanism that prevents the backlog of files from building up in the first place. For example, you could consider setting up a monitoring system that alerts you when the number of unprocessed files reaches a certain threshold and then trigger the AutoLoader job manually to process the files.

Anonymous
Not applicable

Hi @nolanlavender008โ€‹ 

Hope everything is going great.

Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we can help you. 

Cheers!

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!