cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to get a minor DBR image?

SepidehEb
Databricks Employee
Databricks Employee

In short, we aim to add a step to a CI job that would run tests in a container, which supposedly should mimic DBR of our clusters – currently we use 7.3 . We consider using one of databricksruntime images (possibly a standard:7.x for now, https://hub.docker.com/r/databricksruntime/standard/tags) and plan to customize it with pytest and some other libraries as described here: https://docs.databricks.com/clusters/custom-containers.html#option-1-use-a-base-built-by-databricks.

The problem we ran into is that there’re no images for specific minor DBR versions, and there’s some mismatch in versions of python libraries installed on the image and those installed on the clusters, https://docs.databricks.com/release-notes/runtime/7.3.html#installed-python-libraries. Could you suggest if there’s a good option for getting images of specific DBR versions, with libraries versions matching those in the actual DBR?

1 ACCEPTED SOLUTION

Accepted Solutions

Atanu
Databricks Employee
Databricks Employee

Hi @Sepideh Ebrahimi​ , since cluster is Databricks proprietary, you ca not run it locally. as @Werner Stinckens​  said, you can build your own image but that has to be run in cluster. but there is databricks connect (https://docs.databricks.com/dev-tools/databricks-connect.html ) which basically let you build your code on your IDE, and execute the same to databricks cluster (which will run in your workspace).

View solution in original post

6 REPLIES 6

Hubert-Dudek
Esteemed Contributor III

In my opinion as databricks is cloud system build on top of spark so the best option is to have environments in cloud and run tests there. So basically all tests will be run via Jobs API and you use Repos. Of course you can still use Docker in all 3 environments:

  • developer environment,
  • build pipeline environment (where we deploy and run test after commit/pull request)
  • release pipeline (production)

Tests can be something like run SQL queries and compare results etc (returned via Jobs). I thought also about implementing SQL alarms for testing but they are still in preview.

I also think that as Repos are quite new, we can expect more improvements with CI/CD pipelines as current way is far from perfection 🙂

-werners-
Esteemed Contributor III

Hmm, this question pops up regularly.

A databricks docker image (single node) for the different runtimes which can be deployed for test purposes/dev work.

I would like that... I agree Databricks is cloud based and tests should be ran on the cloud platform. But there is also a case for running unit tests etc on a local container.

I think you are stuck running tests on databricks itself for the moment.

I am sure there will be a lot of improvements on the whole dev workflow.

Atanu
Databricks Employee
Databricks Employee

@Sepideh Ebrahimi​  please let us know that answer your query. Thanks.

SepidehEb
Databricks Employee
Databricks Employee

@Atanu Sarkar​ it does not. The question is can we get a DBR image and deploy it outside Databricks?

-werners-
Esteemed Contributor III

You will have to build it yourself:

https://docs.microsoft.com/en-us/azure/databricks/clusters/custom-containers

But I doubt you can run that locally, as the docs specifically mention to launch a cluster

Also they mention only the main version and not the minors, so it seems that it is not possible.

Atanu
Databricks Employee
Databricks Employee

Hi @Sepideh Ebrahimi​ , since cluster is Databricks proprietary, you ca not run it locally. as @Werner Stinckens​  said, you can build your own image but that has to be run in cluster. but there is databricks connect (https://docs.databricks.com/dev-tools/databricks-connect.html ) which basically let you build your code on your IDE, and execute the same to databricks cluster (which will run in your workspace).

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group