i need to convert a spark dataframe to pandas dataframe with arrow optimization
spark.conf.set("spark.sql.execution.arrow.enabled", "true")
data_df=df.toPandas()
but getting one of the below error randomly while doing so
Exception: arrow is not supported when using file-based collect
OR
/databricks/spark/python/pyspark/sql/pandas/conversion.py:340: UserWarning: createDataFrame attempted Arrow optimization because 'spark.sql.execution.arrow.pyspark.enabled' is set to true; however, failed by the reason below:
[Errno 13] Permission denied: '/local_disk0/spark-*/pyspark-*'
Attempting non-optimization as 'spark.sql.execution.arrow.pyspark.fallback.enabled' is set to true.
Note: Using high concurrency pass through cluster with 10.0 ML runtime
another problem with Pass through Cluster is not able to load the registered model and make predicitons using spark but have to use pandas mode . getting below error while loading model using udf . is it a limitation of pass through high concurrency cluster as it works in standard cluster ?
predict = mlflow.pyfunc.spark_udf(spark, model_uri)
Exception
PermissionError: [Errno 13] Permission denied: '/databricks/driver'