cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Running unit tests and hyperopt causes a broadcast variable exception

Boyan
New Contributor II

Hello,

We are using hyperopt to train a model with relatively large train dataset.

We've experience some performance issues and following the suggestions in this notebook, we broadcasted the dataset.

To verify that broadcasting the dataset resolved the performance issue, we did an experiment using Databricks Runtime for Machine Learning and a Notebook. We did see a significant performance boost.

To deploy our code, we package it as a .whl file and utilize python jobs to deploy it to an Azure Databricks Service. Provided we run the job using Databricks Runtime for Machine Learning, we do not have any issues.

We run into the following issues "Broadcast variable '5' not loaded!", when we run unit tests for our jobs locally or via our CICD pipelines.

This appears to be a known bug in the hyperopt library and there is a fix merged to master but it is not released.

Databricks Runtime for Machine Learning ships with a Databricks fork of hyperopt - version 0.2.7+db1, which has a fix too.

Given that this fork is only available on Databricks Runtimes for Machine Learning, what is the recommended approach to run unit tests on CI/CD infrastructure or local development machines?

1 REPLY 1

Boyan
New Contributor II

Hello @Retired_mod ,

thank you for the links provided.

To run our production workloads we use python jobs and not notebooks. Most of the time, we develop a new job locally against mocked/test datasets and than we run into the issue I described. We run integration tests against an Azure Databricks cluster much less often.

So my question really is, is there a way to install the Databricks fork of hyperopt - version 0.2.7+db1 locally?

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group