โ11-29-2021 06:58 AM
In short, we aim to add a step to a CI job that would run tests in a container, which supposedly should mimic DBR of our clusters โ currently we use 7.3 . We consider using one of databricksruntime images (possibly a standard:7.x for now, https://hub.docker.com/r/databricksruntime/standard/tags) and plan to customize it with pytest and some other libraries as described here: https://docs.databricks.com/clusters/custom-containers.html#option-1-use-a-base-built-by-databricks.
The problem we ran into is that thereโre no images for specific minor DBR versions, and thereโs some mismatch in versions of python libraries installed on the image and those installed on the clusters, https://docs.databricks.com/release-notes/runtime/7.3.html#installed-python-libraries. Could you suggest if thereโs a good option for getting images of specific DBR versions, with libraries versions matching those in the actual DBR?
โ11-30-2021 07:38 AM
Hi @Sepideh Ebrahimiโ , since cluster is Databricks proprietary, you ca not run it locally. as @Werner Stinckensโ said, you can build your own image but that has to be run in cluster. but there is databricks connect (https://docs.databricks.com/dev-tools/databricks-connect.html ) which basically let you build your code on your IDE, and execute the same to databricks cluster (which will run in your workspace).
โ11-29-2021 07:13 AM
In my opinion as databricks is cloud system build on top of spark so the best option is to have environments in cloud and run tests there. So basically all tests will be run via Jobs API and you use Repos. Of course you can still use Docker in all 3 environments:
Tests can be something like run SQL queries and compare results etc (returned via Jobs). I thought also about implementing SQL alarms for testing but they are still in preview.
I also think that as Repos are quite new, we can expect more improvements with CI/CD pipelines as current way is far from perfection ๐
โ11-29-2021 09:56 AM
Hmm, this question pops up regularly.
A databricks docker image (single node) for the different runtimes which can be deployed for test purposes/dev work.
I would like that... I agree Databricks is cloud based and tests should be ran on the cloud platform. But there is also a case for running unit tests etc on a local container.
I think you are stuck running tests on databricks itself for the moment.
I am sure there will be a lot of improvements on the whole dev workflow.
โ11-29-2021 10:56 PM
@Sepideh Ebrahimiโ please let us know that answer your query. Thanks.
โ11-30-2021 12:30 AM
@Atanu Sarkarโ it does not. The question is can we get a DBR image and deploy it outside Databricks?
โ11-30-2021 01:17 AM
You will have to build it yourself:
https://docs.microsoft.com/en-us/azure/databricks/clusters/custom-containers
But I doubt you can run that locally, as the docs specifically mention to launch a cluster.
Also they mention only the main version and not the minors, so it seems that it is not possible.
โ11-30-2021 07:38 AM
Hi @Sepideh Ebrahimiโ , since cluster is Databricks proprietary, you ca not run it locally. as @Werner Stinckensโ said, you can build your own image but that has to be run in cluster. but there is databricks connect (https://docs.databricks.com/dev-tools/databricks-connect.html ) which basically let you build your code on your IDE, and execute the same to databricks cluster (which will run in your workspace).
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group