cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

Using JARs from Google Cloud Artifact Registry in Databricks for Job Execution

aghiya
New Contributor

We have our CI/CD pipelines set up in Google Cloud using Cloud Build, and we are publishing our artifacts to a private repository in Google Cloud's Artifact Registry. I want to use these JAR files to create and run jobs in Databricks.

However, when I try to put in the repository I get the following error - "Repository must be a valid URL". The repository URL is of the format "artifactregistry://". I am looking for guidance on the best way to achieve this.

Also, if this is not possible, can we integrate our CI/CD pipeline to upload jars directly in DBFS and refer the same. Can you please point to an example where databricks-cli is used in such fashion.

Thanks

1 REPLY 1

brijeshgoud1
New Contributor II

To integrate your CI/CD pipeline with Databricks, you have a couple of options:

1. Using Artifact Registry as an intermediary: Currently, Databricks does not directly support Google Artifact Registry URLs. However, you can use an intermediate storage mechanism, such as Google Cloud Storage (GCS) or AWS S3, to store your JAR files. Then, you can use Databricks' ability to reference JAR files from these storage solutions. You would need to modify your CI/CD pipeline to publish the artifacts to GCS or S3 after they are built and then reference them in your Databricks jobs.

Directly uploading JAR files to DBFS: Another approach is to upload your JAR files directly to Databricks File System (DBFS) from your CI/CD pipeline. You can achieve this using the Databricks CLI or API. Here's a high-level overview of how you can do this:

  • Install the Databricks CLI in your CI/CD environment.
  • Use the databricks fs cp command to upload your JAR files to DBFS.
  • Once the files are uploaded, you can reference them in your Databricks jobs using the DBFS path.
  • Ensure that your Databricks cluster has the necessary permissions to access DBFS and execute the job with the uploaded JAR file.
    I hope this helps

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!