cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Older Spark Version loaded into the spark notebook

dbu_spark
New Contributor III

I have databricks runtime for a job set to latest 10.0 Beta (includes Apache Spark 3.2.0, Scala 2.12) .

In the notebook when I check for the spark version, I see version 3.1.0 instead of version 3.2.0

I need the Spark version 3.2 to process workloads as that version has the fix for https://github.com/apache/spark/pull/32788

Screenshot with the cluster configuration and the older spark version in notebook attached.

Screen Shot 2021-10-20 at 11.45.10 AM

10 REPLIES 10

Hubert-Dudek
Esteemed Contributor III

If you use pool please check what preloaded version is set also in pool.

If it is not that problem I can not help as I even don't see yet 10.0 (and after all it is Beta)

Anonymous
Not applicable

I got the same thing when I tested it out. I guess that's why it's Beta, should get fixed soon I imagine.

dbu_spark
New Contributor III

I am not using the pool. Thanks for the update though. Hopefully this gets fixed soon.

Dan_Z
Honored Contributor
Honored Contributor

This is due to some legalese related to Open-Source Spark and Databricks' Spark. Since Open-Source Spark has not released v3.2 yet, we are not allowed to call the one on DBR 10 v3.2 yet. It may have that patch already in there though.

dbu_spark
New Contributor III

Thanks for the update @Dan Zafar​ .

Just ran the job again and still seeing spark version 3.1.0. Should I be using spark32 (or something similar) when invoking spark session for me to pick up the correct spark version?

Any ETA on the spark 3.2 version availability will be great.

Thanks

Dan_Z
Honored Contributor
Honored Contributor

It should have all the features you need. Check it out. Legally we can't call it Spark 3.2 yet.

dbu_spark
New Contributor III

I do not think it is loading Spark 3.2. I am still seeing the issue with writeUTF which has been fixed in Spark 3.2 -> https://github.com/apache/spark/pull/32788

Caused by: java.io.UTFDataFormatException: encoded string too long: 97548 bytes
	at java.io.DataOutputStream.writeUTF(DataOutputStream.java:364)
	at java.io.DataOutputStream.writeUTF(DataOutputStream.java:323)

Anyways, I will wait for the databricks runtime to correctly reflect the correct version.

Dan_Z
Honored Contributor
Honored Contributor

Yes- this version probably has the Databricks internal features slated for Spark 3.2, but the features/patches contributed by the open-source community may still be coming. Sorry this isn't available yet. I'm sure it will be very soon. Happy coding!

-werners-
Esteemed Contributor III

I just noticed that (on Azure anyway) 10.0 is NOT in beta anymore.

So 'very soon' was indeed very soon.

jose_gonzalez
Moderator
Moderator

hi @Dhaivat Upadhyay​ ,

Good news, DBR 10 was release yesterday October 20th. You can find more details in the release notes website

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.