Using JARs from Google Cloud Artifact Registry in Databricks for Job Execution
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-06-2024 12:24 AM
We have our CI/CD pipelines set up in Google Cloud using Cloud Build, and we are publishing our artifacts to a private repository in Google Cloud's Artifact Registry. I want to use these JAR files to create and run jobs in Databricks.
However, when I try to put in the repository I get the following error - "Repository must be a valid URL". The repository URL is of the format "artifactregistry://". I am looking for guidance on the best way to achieve this.
Also, if this is not possible, can we integrate our CI/CD pipeline to upload jars directly in DBFS and refer the same. Can you please point to an example where databricks-cli is used in such fashion.
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-06-2024 01:10 AM
To integrate your CI/CD pipeline with Databricks, you have a couple of options:
1. Using Artifact Registry as an intermediary: Currently, Databricks does not directly support Google Artifact Registry URLs. However, you can use an intermediate storage mechanism, such as Google Cloud Storage (GCS) or AWS S3, to store your JAR files. Then, you can use Databricks' ability to reference JAR files from these storage solutions. You would need to modify your CI/CD pipeline to publish the artifacts to GCS or S3 after they are built and then reference them in your Databricks jobs.
Directly uploading JAR files to DBFS: Another approach is to upload your JAR files directly to Databricks File System (DBFS) from your CI/CD pipeline. You can achieve this using the Databricks CLI or API. Here's a high-level overview of how you can do this:
- Install the Databricks CLI in your CI/CD environment.
- Use the databricks fs cp command to upload your JAR files to DBFS.
- Once the files are uploaded, you can reference them in your Databricks jobs using the DBFS path.
Ensure that your Databricks cluster has the necessary permissions to access DBFS and execute the job with the uploaded JAR file.
I hope this helps

