02-25-2022 04:07 AM
Hello,
I would like to set the default "spark.driver.maxResultSize" from the notebook on my cluster. I know I can do that in the cluster settings, but is there a way to set it by code?
I also know how to do it when I start a spark session, but in my case I directly load from the feature store and want to transform my pyspark data frame to pandas.
from databricks import feature_store
import pandas as pd
import pyspark.sql.functions as f
from os.path import join
fs = feature_store.FeatureStoreClient()
prediction_data = fs.read_table(name=NAME)
prediction_data_pd = prediction_data.toPandas()
03-03-2022 05:41 AM
@Maximilian Hansinger may be you can follow this -
https://kb.databricks.com/jobs/job-fails-maxresultsize-exception.html
02-25-2022 04:31 AM
Hi @Maximilian Hansinger ,
Please try this:-
from pyspark import SparkContext
from pyspark import SparkConf
conf = SparkConf()
.setMaster('yarn') \
.setAppName('xyz') \
.set('spark.driver.extraClassPath', '/usr/local/bin/postgresql-42.2.5.jar') \
.set('spark.executor.instances', 4) \
.set('spark.executor.cores', 4) \
.set('spark.executor.memory', '10g') \
.set('spark.driver.memory', '15g') \
.set('spark.memory.offHeap.enabled', True) \
.set('spark.memory.offHeap.size', '20g') \
.set('spark.dirver.maxResultSize', '4096')
spark_context = SparkContext(conf=conf)
02-25-2022 06:09 AM
Hi @Kaniz Fatma thanks for your reply.
Not sure if that helps. When I check after execution of your code with
spark.conf.get("spark.driver.maxResultSize")
I still get the default "spark.dirver.maxResultSize", instead of 4096.
02-25-2022 06:14 AM
Hi @Maximilian Hansinger , Alternatively try this:-
from pyspark.sql import SparkSession
spark = (SparkSession.builder
.master('yarn') # depends on the cluster manager of your choice
.appName('xyz')
.config('spark.driver.extraClassPath', '/usr/local/bin/postgresql-42.2.5.jar')
.config('spark.executor.instances', 4)
.config('spark.executor.cores', 4)
.config('spark.executor.memory', '10g')
.config('spark.driver.memory', '15g')
.config('spark.memory.offHeap.enabled', True)
.config('spark.memory.offHeap.size', '20g')
.config('spark.dirver.maxResultSize', '4096')
)
sc = spark.sparkContext
03-03-2022 05:41 AM
@Maximilian Hansinger may be you can follow this -
https://kb.databricks.com/jobs/job-fails-maxresultsize-exception.html
03-08-2022 01:56 PM
@Maximilian Hansinger - Would you let us know how it goes, please?
04-28-2022 09:37 AM
Hi @Maximilian Hansinger
Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark the answer as best? If not, please tell us so we can help you.
Thanks!
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group