cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Getting Spark & Scala version in Cluster node initialization script

ahuarte
New Contributor III

Hi there,

I am developing a Cluster node initialization script (https://docs.gcp.databricks.com/clusters/init-scripts.html#environment-variables) in order to install some custom libraries.

Reading the docs of Databricks we can get some environment variables with data related with the current running cluster node.

But I need to figure out what Spark & Scala version is currently been deployed. Is this possible?

Thanks in advance

Regards

1 ACCEPTED SOLUTION

Accepted Solutions

sean_owen
Databricks Employee
Databricks Employee

Hm, this is a hacky idea, maybe there is a better way, but you could

ls /databricks/jars/spark*

and parse the results to get the version of Spark and Scala. You'll see files like spark--command--command-spark_3.1_2.12_deploy.jar containing the versions.

View solution in original post

17 REPLIES 17

ahuarte
New Contributor III

Hi Kaniz, thank you very much. For sure I will learn very much in this forum.

Prabakar
Databricks Employee
Databricks Employee

Hi @A Huarte​ you can get the spark and scala version from the DBR that you will be using on the cluster.

image

Prabakar
Databricks Employee
Databricks Employee

image

ahuarte
New Contributor III

Hi @Prabakar Ammeappin​ Thank you very much for your response,

but I mean how I can get this info in a script. I am trying to develop this sh init script for several Clusters with different Databricks runtimes.

I tried it searching files in that script but I did not find any "*spark*.jar" file from where to extract the current version of the runtime (Spark & Scala version).

When the cluster is already started there are files with this pattern, but in the moment that the init script is executed it seems that pyspark is not installed yet.

ahuarte
New Contributor III

I know that Databricks CLI tool is available, but it is not configured when the init script is running.

sean_owen
Databricks Employee
Databricks Employee

Hm, this is a hacky idea, maybe there is a better way, but you could

ls /databricks/jars/spark*

and parse the results to get the version of Spark and Scala. You'll see files like spark--command--command-spark_3.1_2.12_deploy.jar containing the versions.

ahuarte
New Contributor III

Hi @Sean Owen​ thanks four your reply,

your idea can work, but unfortunatelly there is any filename with the full version name. I am missing the minor part:

yyyyyy_spark_3.2_2.12_xxxxx.jar -> Spark version is really 3.2.0

I have configured databricks CLI to get metadata of the cluster and I get this output:

{

"cluster_id": "XXXXXXXXX",

"spark_context_id": YYYYYYYYYYYY,

"cluster_name": "Devel - Geospatial",

"spark_version": "10.1.x-cpu-ml-scala2.12", ##<------!!!!

....

}

"spark_version" property does not contain info about the spark version but about the DBR :-(, any thoughts?

Thanks in advance

regards

Alvaro

sean_owen
Databricks Employee
Databricks Employee

Do you need such specific Spark version info, why? should not matter for user applications

ahuarte
New Contributor III

sean_owen
Databricks Employee
Databricks Employee

I doubt it's sensitive to a minor release, why?

But you also control what DBR/Spark version you launch the cluster with

ahuarte
New Contributor III

Many thanks @Sean Owen​ I am going to apply your advice, I am not going to write a generic init script that figures out everything, but a specific version of it for each Cluster type, really we only have 3 DBR types.

Thank you very much for your support

Regards

Anonymous
Not applicable

@A Huarte​ - How did it go?

ahuarte
New Contributor III

Hi,

My idea was to deploy Geomesa or Rasterframes on Databricks in order to provide spatial capabilities to this platofrm. Finally, according to some advices in Rasterframes Gitter chat I selected the DBR 9.0 where I am installing pyrasterframes 0.10.0 via "pip" and no getting any errors.

I hope this info can be help.

Regards

Anonymous
Not applicable

Thank you so much! Would you be happy to mark whichever answer is best in your mind? That will help new members know which is the most effective.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group