cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

count or toPandas taking too long

jimcast
New Contributor

Hi,

I am fetching data from unity catalog from notebooks using spark.sql(). The query takes just a few seconds - I am actually trying to retrieving 2 rows - but some operations like count() or toPandas() take forever. I wonder why does it take so long and if there is a way to speed up those operations. 

Compute: personal compute m5d.2xlarge (14.1 (includes Apache Spark 3.5.0, Scala 2.12))

Thanks!

 

2 REPLIES 2

Hkesharwani
Contributor II

Hi,  it is quite normal that converting data frame from spark to pandas takes time.
Although there is a way we can optimize it.
Enable Arrow Optimization: Starting from Spark 3.0.0, We can enable arrow optimization, this will speed up the process by enabling  the use of Apache Arrow for faster data transfer between Spark and Python.

 

 

spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "true")

 

 

 

 

Harshit Kesharwani
Self-taught Data Engineer | Seeking Remote Full-time Opportunities

anardinelli
New Contributor III
New Contributor III

Hey @jimcast how are you?

You can check the internals and have a good hint of what's happening using the SparkUI. Filter and select the jobs that are taking the longest and check what is being requested on the SQL/Data Frame tab, as well as their plans. 

If your data is public, please also share more details (such as logs, prints and dumps) so we can better help you with.

Best,

Alessandro

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!