Databricks Community

mhansinger · ‎02-25-2022

Hello,

I would like to set the default "spark.driver.maxResultSize" from the notebook on my cluster. I know I can do that in the cluster settings, but is there a way to set it by code?

I also know how to do it when I start a spark session, but in my case I directly load from the feature store and want to transform my pyspark data frame to pandas.

from databricks import feature_store
import pandas as pd
import pyspark.sql.functions as f
from os.path import join
 
fs = feature_store.FeatureStoreClient()
 
prediction_data = fs.read_table(name=NAME)
 
prediction_data_pd = prediction_data.toPandas()

Atanu · ‎03-03-2022

@Maximilian Hansinger may be you can follow this -

https://kb.databricks.com/jobs/job-fails-maxresultsize-exception.html

View solution in original post

mhansinger · ‎02-25-2022

Hi @Kaniz Fatma thanks for your reply.

Not sure if that helps. When I check after execution of your code with

spark.conf.get("spark.driver.maxResultSize")

I still get the default "spark.dirver.maxResultSize", instead of 4096.

Atanu · ‎03-03-2022

@Maximilian Hansinger may be you can follow this -

https://kb.databricks.com/jobs/job-fails-maxresultsize-exception.html

Anonymous · ‎03-08-2022

@Maximilian Hansinger - Would you let us know how it goes, please?

Anonymous · ‎04-28-2022

Hi @Maximilian Hansinger

Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark the answer as best? If not, please tell us so we can help you.

Thanks!

Databricks Community

Set default "spark.driver.maxResultSize" from the notebook

Connect with Databricks Users in Your Area

Databricks Named a Leader in the 2024 Gartner® Magic Quadrant™ for Cloud Database Management Systems

Announcing the new Meta Llama 3.3 model on Databricks

Milestone: DatabricksTV Reaches 100 Videos!

Dotmatics and Databricks Partner to Advance Scientific Intelligence in Life Sciences

Databricks Community Champion - December 2024 - Sujesh Menon