cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

nolanlavender00
by New Contributor
  • 723 Views
  • 2 replies
  • 0 kudos

Garbage Collection on AutoLoader

Once a week, I get very long run times with AutoLoader. The spark job says it is done, but garbage collection keeps rising on the driver. I assume this is because of the backfill interval that I am using with FileNotification Type. I have this set to...

  • 723 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @nolanlavender008​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us...

  • 0 kudos
1 More Replies
nolanlavender00
by New Contributor
  • 2085 Views
  • 2 replies
  • 0 kudos

How to control garbage collection while using Autoloader File Notification?

I am using Autoloader to load files from a directory. I have set up File Notification with the Event Subscription. I have a backfill interval set to 1 day and have not run the stream for a week. There should only be about ~100 new files to pick up an...

  • 2085 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @nolanlavender008​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answ...

  • 0 kudos
1 More Replies
aschiff
by Contributor II
  • 160434 Views
  • 32 replies
  • 1 kudos

GC Driver Error

I am using a cluster in databricks to connect to a Tableau workbook through the JDBC connector. My Tableau workbook has been unable to load due to resources not being available through the data connection. I went to look at the driver log for my clus...

  • 160434 Views
  • 32 replies
  • 1 kudos
Latest Reply
aschiff
Contributor II
  • 1 kudos

I recreated the problematic workbook connecting to the same cluster and using the same data with its three sheets/charts successfully and all were able to load properly. I then went to databricks to look at the spark UI and the SQL tab to find out th...

  • 1 kudos
31 More Replies
HariharaSam
by Contributor
  • 994 Views
  • 3 replies
  • 0 kudos

DRIVER Garbage Collection

Does anyone know how to fix this ..??

image
  • 994 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Hariharan Sambath​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you....

  • 0 kudos
2 More Replies
sanchit_popli
by New Contributor II
  • 655 Views
  • 0 replies
  • 0 kudos

How can process 3.5GB GZ (~90GB) nested JSON and convert them to tabular formats with less processing time and optimized cost in Azure Databricks?

I have a total of 5000 files (Nested JSON ~ 3.5 GB). I have written a code which converts the json to Table in minutes (for JSON size till 1 GB) but when I am trying to process 3.5GB GZ json it is mostly getting failed because of Garbage collection. ...

Data frame structure Code Reading Code
  • 655 Views
  • 0 replies
  • 0 kudos
User16826994223
by Honored Contributor III
  • 2220 Views
  • 2 replies
  • 0 kudos

Resolved! Garbage Collection optimization

I have a case where garbage collection is taking much time and I want to optimize it for better performance

  • 2220 Views
  • 2 replies
  • 0 kudos
Latest Reply
sean_owen
Honored Contributor II
  • 0 kudos

You can also tune the JVM's GC parameters directly, if you mean the pauses are too long. Set "spark.executor.extraJavaOptions", but it does require knowing a thing or two about how to tune for what performance goal.

  • 0 kudos
1 More Replies
Labels