Databricks

ahuarte · ‎12-16-2021

Hi there,

I am developing a Cluster node initialization script (https://docs.gcp.databricks.com/clusters/init-scripts.html#environment-variables) in order to install some custom libraries.

Reading the docs of Databricks we can get some environment variables with data related with the current running cluster node.

But I need to figure out what Spark & Scala version is currently been deployed. Is this possible?

Thanks in advance

Regards

sean_owen · ‎12-16-2021

Hm, this is a hacky idea, maybe there is a better way, but you could

ls /databricks/jars/spark*

and parse the results to get the version of Spark and Scala. You'll see files like spark--command--command-spark_3.1_2.12_deploy.jar containing the versions.

View solution in original post

Kaniz · ‎12-16-2021

Hi @A Huarte ! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else I will get back to you soon. Thanks.

ahuarte · ‎12-16-2021

Hi Kaniz, thank you very much. For sure I will learn very much in this forum.

Prabakar · ‎12-16-2021

Hi @A Huarte you can get the spark and scala version from the DBR that you will be using on the cluster.

Prabakar · ‎12-16-2021

ahuarte · ‎12-16-2021

Hi @Prabakar Ammeappin Thank you very much for your response,

but I mean how I can get this info in a script. I am trying to develop this sh init script for several Clusters with different Databricks runtimes.

I tried it searching files in that script but I did not find any "*spark*.jar" file from where to extract the current version of the runtime (Spark & Scala version).

When the cluster is already started there are files with this pattern, but in the moment that the init script is executed it seems that pyspark is not installed yet.

ahuarte · ‎12-16-2021

I know that Databricks CLI tool is available, but it is not configured when the init script is running.

sean_owen · ‎12-16-2021

Hm, this is a hacky idea, maybe there is a better way, but you could

ls /databricks/jars/spark*

and parse the results to get the version of Spark and Scala. You'll see files like spark--command--command-spark_3.1_2.12_deploy.jar containing the versions.

ahuarte · ‎12-17-2021

Hi @Sean Owen thanks four your reply,

your idea can work, but unfortunatelly there is any filename with the full version name. I am missing the minor part:

yyyyyy_spark_3.2_2.12_xxxxx.jar -> Spark version is really 3.2.0

I have configured databricks CLI to get metadata of the cluster and I get this output:

{

"cluster_id": "XXXXXXXXX",

"spark_context_id": YYYYYYYYYYYY,

"cluster_name": "Devel - Geospatial",

"spark_version": "10.1.x-cpu-ml-scala2.12", ##<------!!!!

....

}

"spark_version" property does not contain info about the spark version but about the DBR :-(, any thoughts?

Thanks in advance

regards

Alvaro

sean_owen · ‎12-17-2021

Do you need such specific Spark version info, why? should not matter for user applications

ahuarte · ‎12-17-2021

I am trying to install Geomesa, from: https://mvnrepository.com/artifact/org.locationtech.geomesa/geomesa-gt-spark-runtime

or

from:

https://github.com/locationtech/geomesa/releases

I think I need the exact release.

sean_owen · ‎12-17-2021

I doubt it's sensitive to a minor release, why?

But you also control what DBR/Spark version you launch the cluster with

ahuarte · ‎12-17-2021

Many thanks @Sean Owen I am going to apply your advice, I am not going to write a generic init script that figures out everything, but a specific version of it for each Cluster type, really we only have 3 DBR types.

Thank you very much for your support

Regards

Anonymous · ‎12-27-2021

@A Huarte - How did it go?

ahuarte · ‎12-28-2021

Hi,

My idea was to deploy Geomesa or Rasterframes on Databricks in order to provide spatial capabilities to this platofrm. Finally, according to some advices in Rasterframes Gitter chat I selected the DBR 9.0 where I am installing pyrasterframes 0.10.0 via "pip" and no getting any errors.

I hope this info can be help.

Regards