cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Using spark jars using databricks-connect>=13.0

Lazloo
New Contributor III

With the newest version of databricks-connect, I cannot configure the extra jars I want to use. In the older version, I did that via

spark = SparkSession.builder.appName('DataFrame').\
        config('spark.jars.packages','org.apache.spark:spark-avro_2.12:3.3.0').getOrCreate()

How can I configure this with databricks-connect>=13.0?

1 REPLY 1

Kaniz
Community Manager
Community Manager

Hi @Lazloo, In the newer versions of Databricks Connect, configuring additional JARs for your Spark session is still possible.

 

Letโ€™s adapt your previous approach to the latest version.

 

Adding JARs to a Databricks cluster:

  • If you want to add JAR files to your Databricks cluster, you can copy them directly to the /databricks/jars folder. You can achieve this as part of your cluster-scoped init script. For example:#!/bin/bash # Copy JAR files into the cluster's jars folder cp /dbfs/FileStore/jars/<file-name.jar> /databricks/jars/ # Or copy all JAR files cp /dbfs/FileStore/jars/*.jar /databricks/jars/
  • This ensures that the JAR files are available on all cluster nodes.

Installing Python Packages:

  • Your existing init script installs Python packages using pip. You can install multiple packages with a single pip command for efficiency:#!/bin/bash /databricks/python/bin/pip install pandas azure-cosmos python-magic
  • Note that using 2>/dev/null suppresses error messages, which might make debugging harder. Consider omitting it to aid in troubleshooting.

Adding JARs as Cluster Libraries:

  • To add an existing JAR file as a cluster library, you can use the ADD JAR command in Databricks Notebooks or SQL cells:-- Example: Adding a JAR file ADD JAR /tmp/test.jar;
  • Alternatively, you can use the Databricks REST API to programmatically add JARs to a cluster.

Remember that any changes to cluster libraries typically require the cluster to be in a RUNNING state. If you need to execute actions during cluster startup or restart, consider using an init script like the one youโ€™ve described. However, keep in mind that adding JARs as cluster libraries may not be instantaneous; it might take some time for the changes to propagate across the cluster configuration.

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!