cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

wojciech_jakubo
by New Contributor III
  • 10385 Views
  • 7 replies
  • 2 kudos

Question about monitoring driver memory utilization

Hi databricks/spark experts!I have a piece on pandas-based 3rd party code that I need to execute as a part of a bigger spark pipeline. By nature, pandas-based code is executed on driver node. I ran into out of memory problems and started exploring th...

Driver memory cycles_ Busy cluster
  • 10385 Views
  • 7 replies
  • 2 kudos
Latest Reply
Tharun-Kumar
Databricks Employee
  • 2 kudos

Hi @wojciech_jakubo 1. JVM memory will not be utilized for python related activities. 2. In the image we could only see the storage memory. We also have execution memory which would also be the same. Hence I came up with the executor memory to be of ...

  • 2 kudos
6 More Replies
GC-James
by Contributor II
  • 14412 Views
  • 15 replies
  • 5 kudos

Resolved! Lost memory when using dbutils

Why does copying a 9GB file from a container to the /dbfs lose me 50GB of memory? (Which doesn't come back until I restarted the cluster)

image
  • 14412 Views
  • 15 replies
  • 5 kudos
Latest Reply
AdrianP
New Contributor II
  • 5 kudos

Hi James,Did you get to the bottom of this? We are experiencing the same issue, and all the suggested solutions don't seem to work.Thanks,Adrian

  • 5 kudos
14 More Replies
Abhijeet
by New Contributor III
  • 3450 Views
  • 4 replies
  • 5 kudos

How to Read Terabytes of data in Databricks

I want to read 1000 GB data. As in spark we do in memory transformation. Do I need worker nodes with combined size of 1000 GB.Also Just want to understand if will reading we store 1000 GB in memory. So how the Cache Data frame is different from the a...

  • 3450 Views
  • 4 replies
  • 5 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 5 kudos

Hi @Abhijeet Singh​ below blog might help you-Link

  • 5 kudos
3 More Replies
gpzz
by New Contributor II
  • 1684 Views
  • 2 replies
  • 1 kudos

MEMORY_ONLY not working

val doubledAmount = premiumCustomers.map(x=>(x._1, x._2*2)).persist(StorageLevel.MEMORY_ONLY) error: not found: value StorageLevel

  • 1684 Views
  • 2 replies
  • 1 kudos
Latest Reply
Chaitanya_Raju
Honored Contributor
  • 1 kudos

Hi @Gaurav Poojary​ ,Can you please try the below as displayed in the image it is working for me without any issues.Happy Learning!!

  • 1 kudos
1 More Replies
Bujji
by New Contributor II
  • 4889 Views
  • 1 replies
  • 3 kudos

How to resolve our of memory error?

Hi, I am working as azure support engineerI found this error while I am checking the pipeline failure, and showing below error"org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 72403.0 failed 4 times, most recent fail...

  • 4889 Views
  • 1 replies
  • 3 kudos
Latest Reply
Pat
Honored Contributor III
  • 3 kudos

Hi @mahesh bmk​ ,It would be nice to see the sql_query.is there some window function used? You might try to run this on bigger cluster.

  • 3 kudos
pjp94
by Contributor
  • 2717 Views
  • 1 replies
  • 0 kudos

ERROR - Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.

I get the below error when trying to run multi-threading - fails towards the end of the run. My guess is it's related to memory/worker config. I've seen some solutions involving modifying the number of workers or CPU on the cluster - however that's n...

  • 2717 Views
  • 1 replies
  • 0 kudos
Latest Reply
pjp94
Contributor
  • 0 kudos

Since I don't have permissions to change cluster configurations, the only solution that ended up working was setting a max thread count to about half of the actual max so I don't overload the containers. However, open to any other optimization ideas!

  • 0 kudos
chandan_a_v
by Valued Contributor
  • 14599 Views
  • 6 replies
  • 6 kudos

Resolved! Spark Driver Out of Memory Issue

Hi, I am executing a simple job in Databricks for which I am getting below error. I increased the Driver size still I faced same issue. Spark config :from pyspark.sql import SparkSessionspark_session = SparkSession.builder.appName("Demand Forecasting...

  • 14599 Views
  • 6 replies
  • 6 kudos
Latest Reply
chandan_a_v
Valued Contributor
  • 6 kudos

I am getting the above issue while writing a Spark DF as a parquet file to AWS S3. Not doing any broadcast join actually.

  • 6 kudos
5 More Replies
pavanb
by New Contributor II
  • 10199 Views
  • 3 replies
  • 3 kudos

Resolved! memory issues - databricks

Hi All, All of a sudden in our Databricks dev environment, we are getting exceptions related to memory such as out of memory , result too large etc.Also, the error message is not helping to identify the issue.Can someone please guide on what would be...

  • 10199 Views
  • 3 replies
  • 3 kudos
Latest Reply
pavanb
New Contributor II
  • 3 kudos

Thanks for the response @Hubert Dudek​ .if i run the same code in test environment , its getting successfully completed and in dev its giving out of memory issue. Also the configuration of test nand dev environment is exactly same.

  • 3 kudos
2 More Replies
brickster_2018
by Databricks Employee
  • 5171 Views
  • 1 replies
  • 0 kudos

Resolved! Do ganglia report incorrect memory stats?

I am looking at the memory utilization of the executors and I see the heap utilization of the executor is far less than what is reported in the Ganglia. Why do ganglia report incorrect memory details.

  • 5171 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

Ganglia reports the memory utilization at the system level. Say for example if the JVM has Xmx value of 100 GB. At some point, it will occupy 100GB and then with a Garbage collection, it will clear off the heap. Once the GC frees up the memory, th...

  • 0 kudos
Juan_MiguelTrin
by New Contributor
  • 6966 Views
  • 1 replies
  • 0 kudos

How to resolve our of memory error?

I have a data bricks notebook hosted on Azure. I am having this problem when doing INNER JOIN. I tried creating a much higher cluster configuration but it still making outOfMemoryError. org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquir...

  • 6966 Views
  • 1 replies
  • 0 kudos
Latest Reply
shyam_9
Databricks Employee
  • 0 kudos

Hi @Juan Miguel Trinidad,can you please the below suggestions,http://apache-spark-developers-list.1001551.n3.nabble.com/java-lang-OutOfMemoryError-Unable-to-acquire-bytes-of-memory-td16773.html

  • 0 kudos
Labels