Hi @Lazloo, In the newer versions of Databricks Connect, configuring additional JARs for your Spark session is still possible.
Letโs adapt your previous approach to the latest version.
Adding JARs to a Databricks cluster:
- If you want to add JAR files to your Databricks cluster, you can copy them directly to the /databricks/jars folder. You can achieve this as part of your cluster-scoped init script. For example:#!/bin/bash # Copy JAR files into the cluster's jars folder cp /dbfs/FileStore/jars/<file-name.jar> /databricks/jars/ # Or copy all JAR files cp /dbfs/FileStore/jars/*.jar /databricks/jars/
- This ensures that the JAR files are available on all cluster nodes.
Installing Python Packages:
- Your existing init script installs Python packages using pip. You can install multiple packages with a single pip command for efficiency:#!/bin/bash /databricks/python/bin/pip install pandas azure-cosmos python-magic
- Note that using 2>/dev/null suppresses error messages, which might make debugging harder. Consider omitting it to aid in troubleshooting.
Adding JARs as Cluster Libraries:
- To add an existing JAR file as a cluster library, you can use the ADD JAR command in Databricks Notebooks or SQL cells:-- Example: Adding a JAR file ADD JAR /tmp/test.jar;
- Alternatively, you can use the Databricks REST API to programmatically add JARs to a cluster.
Remember that any changes to cluster libraries typically require the cluster to be in a RUNNING state. If you need to execute actions during cluster startup or restart, consider using an init script like the one youโve described. However, keep in mind that adding JARs as cluster libraries may not be instantaneous; it might take some time for the changes to propagate across the cluster configuration.