cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Older Spark Version loaded into the spark notebook

dbu_spark
New Contributor III

I have databricks runtime for a job set to latest 10.0 Beta (includes Apache Spark 3.2.0, Scala 2.12) .

In the notebook when I check for the spark version, I see version 3.1.0 instead of version 3.2.0

I need the Spark version 3.2 to process workloads as that version has the fix for https://github.com/apache/spark/pull/32788

Screenshot with the cluster configuration and the older spark version in notebook attached.

Screen Shot 2021-10-20 at 11.45.10 AM

10 REPLIES 10

Hubert-Dudek
Esteemed Contributor III

If you use pool please check what preloaded version is set also in pool.

If it is not that problem I can not help as I even don't see yet 10.0 (and after all it is Beta)

Anonymous
Not applicable

I got the same thing when I tested it out. I guess that's why it's Beta, should get fixed soon I imagine.

dbu_spark
New Contributor III

I am not using the pool. Thanks for the update though. Hopefully this gets fixed soon.

Dan_Z
Databricks Employee
Databricks Employee

This is due to some legalese related to Open-Source Spark and Databricks' Spark. Since Open-Source Spark has not released v3.2 yet, we are not allowed to call the one on DBR 10 v3.2 yet. It may have that patch already in there though.

dbu_spark
New Contributor III

Thanks for the update @Dan Zafar​ .

Just ran the job again and still seeing spark version 3.1.0. Should I be using spark32 (or something similar) when invoking spark session for me to pick up the correct spark version?

Any ETA on the spark 3.2 version availability will be great.

Thanks

Dan_Z
Databricks Employee
Databricks Employee

It should have all the features you need. Check it out. Legally we can't call it Spark 3.2 yet.

dbu_spark
New Contributor III

I do not think it is loading Spark 3.2. I am still seeing the issue with writeUTF which has been fixed in Spark 3.2 -> https://github.com/apache/spark/pull/32788

Caused by: java.io.UTFDataFormatException: encoded string too long: 97548 bytes
	at java.io.DataOutputStream.writeUTF(DataOutputStream.java:364)
	at java.io.DataOutputStream.writeUTF(DataOutputStream.java:323)

Anyways, I will wait for the databricks runtime to correctly reflect the correct version.

Dan_Z
Databricks Employee
Databricks Employee

Yes- this version probably has the Databricks internal features slated for Spark 3.2, but the features/patches contributed by the open-source community may still be coming. Sorry this isn't available yet. I'm sure it will be very soon. Happy coding!

-werners-
Esteemed Contributor III

I just noticed that (on Azure anyway) 10.0 is NOT in beta anymore.

So 'very soon' was indeed very soon.

jose_gonzalez
Databricks Employee
Databricks Employee

hi @Dhaivat Upadhyay​ ,

Good news, DBR 10 was release yesterday October 20th. You can find more details in the release notes website

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group