Databricks Community

aghiya · ‎06-06-2024

We have our CI/CD pipelines set up in Google Cloud using Cloud Build, and we are publishing our artifacts to a private repository in Google Cloud's Artifact Registry. I want to use these JAR files to create and run jobs in Databricks.

However, when I try to put in the repository I get the following error - "Repository must be a valid URL". The repository URL is of the format "artifactregistry://". I am looking for guidance on the best way to achieve this.

Also, if this is not possible, can we integrate our CI/CD pipeline to upload jars directly in DBFS and refer the same. Can you please point to an example where databricks-cli is used in such fashion.

Thanks

brijeshgoud1 · ‎06-06-2024

To integrate your CI/CD pipeline with Databricks, you have a couple of options:

1. Using Artifact Registry as an intermediary: Currently, Databricks does not directly support Google Artifact Registry URLs. However, you can use an intermediate storage mechanism, such as Google Cloud Storage (GCS) or AWS S3, to store your JAR files. Then, you can use Databricks' ability to reference JAR files from these storage solutions. You would need to modify your CI/CD pipeline to publish the artifacts to GCS or S3 after they are built and then reference them in your Databricks jobs.

Directly uploading JAR files to DBFS: Another approach is to upload your JAR files directly to Databricks File System (DBFS) from your CI/CD pipeline. You can achieve this using the Databricks CLI or API. Here's a high-level overview of how you can do this:

Install the Databricks CLI in your CI/CD environment.
Use the databricks fs cp command to upload your JAR files to DBFS.
Once the files are uploaded, you can reference them in your Databricks jobs using the DBFS path.
Ensure that your Databricks cluster has the necessary permissions to access DBFS and execute the job with the uploaded JAR file.
I hope this helps

Databricks Community

Using JARs from Google Cloud Artifact Registry in Databricks for Job Execution

Connect with Databricks Users in Your Area

Introducing SAP Databricks

Serverless Compute for Notebooks, Workflows and Pipelines is now Generally Available on Google Cloud

Welcoming BladeBridge to Databricks: Accelerating Data Warehouse Migrations to Lakehouse

Databricks Clean Rooms: Now Generally Available on AWS and Azure

Securely share data, analytics and AI