cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How to get a minor DBR image?

SepidehEb
New Contributor III
New Contributor III

In short, we aim to add a step to a CI job that would run tests in a container, which supposedly should mimic DBR of our clusters – currently we use 7.3 . We consider using one of databricksruntime images (possibly a standard:7.x for now, https://hub.docker.com/r/databricksruntime/standard/tags) and plan to customize it with pytest and some other libraries as described here: https://docs.databricks.com/clusters/custom-containers.html#option-1-use-a-base-built-by-databricks.

The problem we ran into is that there’re no images for specific minor DBR versions, and there’s some mismatch in versions of python libraries installed on the image and those installed on the clusters, https://docs.databricks.com/release-notes/runtime/7.3.html#installed-python-libraries. Could you suggest if there’s a good option for getting images of specific DBR versions, with libraries versions matching those in the actual DBR?

1 ACCEPTED SOLUTION

Accepted Solutions

Atanu
Esteemed Contributor
Esteemed Contributor

Hi @Sepideh Ebrahimi​ , since cluster is Databricks proprietary, you ca not run it locally. as @Werner Stinckens​  said, you can build your own image but that has to be run in cluster. but there is databricks connect (https://docs.databricks.com/dev-tools/databricks-connect.html ) which basically let you build your code on your IDE, and execute the same to databricks cluster (which will run in your workspace).

View solution in original post

6 REPLIES 6

Hubert-Dudek
Esteemed Contributor III

In my opinion as databricks is cloud system build on top of spark so the best option is to have environments in cloud and run tests there. So basically all tests will be run via Jobs API and you use Repos. Of course you can still use Docker in all 3 environments:

  • developer environment,
  • build pipeline environment (where we deploy and run test after commit/pull request)
  • release pipeline (production)

Tests can be something like run SQL queries and compare results etc (returned via Jobs). I thought also about implementing SQL alarms for testing but they are still in preview.

I also think that as Repos are quite new, we can expect more improvements with CI/CD pipelines as current way is far from perfection 🙂

-werners-
Esteemed Contributor III

Hmm, this question pops up regularly.

A databricks docker image (single node) for the different runtimes which can be deployed for test purposes/dev work.

I would like that... I agree Databricks is cloud based and tests should be ran on the cloud platform. But there is also a case for running unit tests etc on a local container.

I think you are stuck running tests on databricks itself for the moment.

I am sure there will be a lot of improvements on the whole dev workflow.

Atanu
Esteemed Contributor
Esteemed Contributor

@Sepideh Ebrahimi​  please let us know that answer your query. Thanks.

SepidehEb
New Contributor III
New Contributor III

@Atanu Sarkar​ it does not. The question is can we get a DBR image and deploy it outside Databricks?

-werners-
Esteemed Contributor III

You will have to build it yourself:

https://docs.microsoft.com/en-us/azure/databricks/clusters/custom-containers

But I doubt you can run that locally, as the docs specifically mention to launch a cluster

Also they mention only the main version and not the minors, so it seems that it is not possible.

Atanu
Esteemed Contributor
Esteemed Contributor

Hi @Sepideh Ebrahimi​ , since cluster is Databricks proprietary, you ca not run it locally. as @Werner Stinckens​  said, you can build your own image but that has to be run in cluster. but there is databricks connect (https://docs.databricks.com/dev-tools/databricks-connect.html ) which basically let you build your code on your IDE, and execute the same to databricks cluster (which will run in your workspace).

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.