Databricks

dbu_spark · ‎10-20-2021

I have databricks runtime for a job set to latest 10.0 Beta (includes Apache Spark 3.2.0, Scala 2.12) .

In the notebook when I check for the spark version, I see version 3.1.0 instead of version 3.2.0

I need the Spark version 3.2 to process workloads as that version has the fix for https://github.com/apache/spark/pull/32788

Screenshot with the cluster configuration and the older spark version in notebook attached.

Hubert-Dudek · ‎10-20-2021

If you use pool please check what preloaded version is set also in pool.

If it is not that problem I can not help as I even don't see yet 10.0 (and after all it is Beta)

Anonymous · ‎10-20-2021

I got the same thing when I tested it out. I guess that's why it's Beta, should get fixed soon I imagine.

dbu_spark · ‎10-20-2021

I am not using the pool. Thanks for the update though. Hopefully this gets fixed soon.

Dan_Z · ‎10-20-2021

This is due to some legalese related to Open-Source Spark and Databricks' Spark. Since Open-Source Spark has not released v3.2 yet, we are not allowed to call the one on DBR 10 v3.2 yet. It may have that patch already in there though.

dbu_spark · ‎10-20-2021

Thanks for the update @Dan Zafar .

Just ran the job again and still seeing spark version 3.1.0. Should I be using spark32 (or something similar) when invoking spark session for me to pick up the correct spark version?

Any ETA on the spark 3.2 version availability will be great.

Thanks

Dan_Z · ‎10-20-2021

It should have all the features you need. Check it out. Legally we can't call it Spark 3.2 yet.

dbu_spark · ‎10-20-2021

I do not think it is loading Spark 3.2. I am still seeing the issue with writeUTF which has been fixed in Spark 3.2 -> https://github.com/apache/spark/pull/32788

Caused by: java.io.UTFDataFormatException: encoded string too long: 97548 bytes
	at java.io.DataOutputStream.writeUTF(DataOutputStream.java:364)
	at java.io.DataOutputStream.writeUTF(DataOutputStream.java:323)

Anyways, I will wait for the databricks runtime to correctly reflect the correct version.

Dan_Z · ‎10-20-2021

Yes- this version probably has the Databricks internal features slated for Spark 3.2, but the features/patches contributed by the open-source community may still be coming. Sorry this isn't available yet. I'm sure it will be very soon. Happy coding!