Re: High Concurrency Pass Through Cluster : pyarro...

AlexanderBij · ‎08-09-2022

Can you confirm this is a known issue?

Running into same issue, example to test in 1 cell.

# using Arrow fails on HighConcurrency-cluster with PassThrough in runtime 10.4 (and 10.5 and 11.0)
 
spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "true")   # toggle to see difference
df = spark.createDataFrame(sc.parallelize(range(0, 100)), schema="int")
df.toPandas()  # << error here
 
# Msg: arrow is not supported when using file-based collect

It does work on a Personal cluster (Standard / SingleNode) with PassthroughAuth.