cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

aschiff
by Contributor II
  • 378149 Views
  • 33 replies
  • 3 kudos

GC Driver Error

I am using a cluster in databricks to connect to a Tableau workbook through the JDBC connector. My Tableau workbook has been unable to load due to resources not being available through the data connection. I went to look at the driver log for my clus...

  • 378149 Views
  • 33 replies
  • 3 kudos
Latest Reply
galang123
New Contributor II
  • 3 kudos

yesasd

  • 3 kudos
32 More Replies
nolanlavender00
by New Contributor
  • 1216 Views
  • 1 replies
  • 0 kudos

Garbage Collection on AutoLoader

Once a week, I get very long run times with AutoLoader. The spark job says it is done, but garbage collection keeps rising on the driver. I assume this is because of the backfill interval that I am using with FileNotification Type. I have this set to...

  • 1216 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @nolanlavender008​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us...

  • 0 kudos
nolanlavender00
by New Contributor
  • 4102 Views
  • 2 replies
  • 0 kudos

How to control garbage collection while using Autoloader File Notification?

I am using Autoloader to load files from a directory. I have set up File Notification with the Event Subscription. I have a backfill interval set to 1 day and have not run the stream for a week. There should only be about ~100 new files to pick up an...

  • 4102 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @nolanlavender008​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answ...

  • 0 kudos
1 More Replies
HariharaSam
by Contributor
  • 1848 Views
  • 3 replies
  • 0 kudos

DRIVER Garbage Collection

Does anyone know how to fix this ..??

image
  • 1848 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Hariharan Sambath​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you....

  • 0 kudos
2 More Replies
sanchit_popli
by New Contributor II
  • 1187 Views
  • 0 replies
  • 0 kudos

How can process 3.5GB GZ (~90GB) nested JSON and convert them to tabular formats with less processing time and optimized cost in Azure Databricks?

I have a total of 5000 files (Nested JSON ~ 3.5 GB). I have written a code which converts the json to Table in minutes (for JSON size till 1 GB) but when I am trying to process 3.5GB GZ json it is mostly getting failed because of Garbage collection. ...

Data frame structure Code Reading Code
  • 1187 Views
  • 0 replies
  • 0 kudos
User16826994223
by Honored Contributor III
  • 3496 Views
  • 2 replies
  • 0 kudos

Resolved! Garbage Collection optimization

I have a case where garbage collection is taking much time and I want to optimize it for better performance

  • 3496 Views
  • 2 replies
  • 0 kudos
Latest Reply
sean_owen
Databricks Employee
  • 0 kudos

You can also tune the JVM's GC parameters directly, if you mean the pauses are too long. Set "spark.executor.extraJavaOptions", but it does require knowing a thing or two about how to tune for what performance goal.

  • 0 kudos
1 More Replies
Labels