cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

wojciech_jakubo
by New Contributor III
  • 4654 Views
  • 7 replies
  • 2 kudos

Question about monitoring driver memory utilization

Hi databricks/spark experts!I have a piece on pandas-based 3rd party code that I need to execute as a part of a bigger spark pipeline. By nature, pandas-based code is executed on driver node. I ran into out of memory problems and started exploring th...

Driver memory cycles_ Busy cluster
  • 4654 Views
  • 7 replies
  • 2 kudos
Latest Reply
Tharun-Kumar
Honored Contributor II
  • 2 kudos

Hi @wojciech_jakubo 1. JVM memory will not be utilized for python related activities. 2. In the image we could only see the storage memory. We also have execution memory which would also be the same. Hence I came up with the executor memory to be of ...

  • 2 kudos
6 More Replies
GC-James
by Contributor II
  • 5999 Views
  • 17 replies
  • 5 kudos

Resolved! Lost memory when using dbutils

Why does copying a 9GB file from a container to the /dbfs lose me 50GB of memory? (Which doesn't come back until I restarted the cluster)

image
  • 5999 Views
  • 17 replies
  • 5 kudos
Latest Reply
AdrianP
New Contributor II
  • 5 kudos

Hi James,Did you get to the bottom of this? We are experiencing the same issue, and all the suggested solutions don't seem to work.Thanks,Adrian

  • 5 kudos
16 More Replies
Abhijeet
by New Contributor III
  • 1598 Views
  • 5 replies
  • 5 kudos

How to Read Terabytes of data in Databricks

I want to read 1000 GB data. As in spark we do in memory transformation. Do I need worker nodes with combined size of 1000 GB.Also Just want to understand if will reading we store 1000 GB in memory. So how the Cache Data frame is different from the a...

  • 1598 Views
  • 5 replies
  • 5 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 5 kudos

Hi @Abhijeet Singh​ below blog might help you-Link

  • 5 kudos
4 More Replies
gpzz
by New Contributor II
  • 819 Views
  • 2 replies
  • 1 kudos

MEMORY_ONLY not working

val doubledAmount = premiumCustomers.map(x=>(x._1, x._2*2)).persist(StorageLevel.MEMORY_ONLY) error: not found: value StorageLevel

  • 819 Views
  • 2 replies
  • 1 kudos
Latest Reply
Chaitanya_Raju
Honored Contributor
  • 1 kudos

Hi @Gaurav Poojary​ ,Can you please try the below as displayed in the image it is working for me without any issues.Happy Learning!!

  • 1 kudos
1 More Replies
Bujji
by New Contributor II
  • 3165 Views
  • 2 replies
  • 3 kudos

How to resolve our of memory error?

Hi, I am working as azure support engineerI found this error while I am checking the pipeline failure, and showing below error"org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 72403.0 failed 4 times, most recent fail...

  • 3165 Views
  • 2 replies
  • 3 kudos
Latest Reply
Kaniz
Community Manager
  • 3 kudos

Hi @mahesh bmk​, We haven’t heard from you since the last response from @Pat Sienkiewicz​​, and I was checking back to see if their suggestions helped you. Or else, If you have any solution, please share it with the community, as it can be helpful to...

  • 3 kudos
1 More Replies
pjp94
by Contributor
  • 1852 Views
  • 1 replies
  • 0 kudos

ERROR - Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.

I get the below error when trying to run multi-threading - fails towards the end of the run. My guess is it's related to memory/worker config. I've seen some solutions involving modifying the number of workers or CPU on the cluster - however that's n...

  • 1852 Views
  • 1 replies
  • 0 kudos
Latest Reply
pjp94
Contributor
  • 0 kudos

Since I don't have permissions to change cluster configurations, the only solution that ended up working was setting a max thread count to about half of the actual max so I don't overload the containers. However, open to any other optimization ideas!

  • 0 kudos
chandan_a_v
by Valued Contributor
  • 10007 Views
  • 7 replies
  • 6 kudos

Resolved! Spark Driver Out of Memory Issue

Hi, I am executing a simple job in Databricks for which I am getting below error. I increased the Driver size still I faced same issue. Spark config :from pyspark.sql import SparkSessionspark_session = SparkSession.builder.appName("Demand Forecasting...

  • 10007 Views
  • 7 replies
  • 6 kudos
Latest Reply
Kaniz
Community Manager
  • 6 kudos

Hi @Chandan Angadi​, Just a friendly follow-up. Do you still need help, or @Hubert Dudek (Customer)​ and @Werner Stinckens​'s responses help you to find the solution? Please let us know.

  • 6 kudos
6 More Replies
pavanb
by New Contributor II
  • 7371 Views
  • 3 replies
  • 3 kudos

Resolved! memory issues - databricks

Hi All, All of a sudden in our Databricks dev environment, we are getting exceptions related to memory such as out of memory , result too large etc.Also, the error message is not helping to identify the issue.Can someone please guide on what would be...

  • 7371 Views
  • 3 replies
  • 3 kudos
Latest Reply
pavanb
New Contributor II
  • 3 kudos

Thanks for the response @Hubert Dudek​ .if i run the same code in test environment , its getting successfully completed and in dev its giving out of memory issue. Also the configuration of test nand dev environment is exactly same.

  • 3 kudos
2 More Replies
User16869510359
by Esteemed Contributor
  • 4658 Views
  • 1 replies
  • 0 kudos

Resolved! Do ganglia report incorrect memory stats?

I am looking at the memory utilization of the executors and I see the heap utilization of the executor is far less than what is reported in the Ganglia. Why do ganglia report incorrect memory details.

  • 4658 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16869510359
Esteemed Contributor
  • 0 kudos

Ganglia reports the memory utilization at the system level. Say for example if the JVM has Xmx value of 100 GB. At some point, it will occupy 100GB and then with a Garbage collection, it will clear off the heap. Once the GC frees up the memory, th...

  • 0 kudos
Juan_MiguelTrin
by New Contributor
  • 6208 Views
  • 1 replies
  • 0 kudos

How to resolve our of memory error?

I have a data bricks notebook hosted on Azure. I am having this problem when doing INNER JOIN. I tried creating a much higher cluster configuration but it still making outOfMemoryError. org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquir...

  • 6208 Views
  • 1 replies
  • 0 kudos
Latest Reply
shyam_9
Valued Contributor
  • 0 kudos

Hi @Juan Miguel Trinidad,can you please the below suggestions,http://apache-spark-developers-list.1001551.n3.nabble.com/java-lang-OutOfMemoryError-Unable-to-acquire-bytes-of-memory-td16773.html

  • 0 kudos
Labels