Hello,
We are using hyperopt to train a model with relatively large train dataset.
We've experience some performance issues and following the suggestions in this notebook, we broadcasted the dataset.
To verify that broadcasting the dataset resolved the performance issue, we did an experiment using Databricks Runtime for Machine Learning and a Notebook. We did see a significant performance boost.
To deploy our code, we package it as a .whl file and utilize python jobs to deploy it to an Azure Databricks Service. Provided we run the job using Databricks Runtime for Machine Learning, we do not have any issues.
We run into the following issues "Broadcast variable '5' not loaded!", when we run unit tests for our jobs locally or via our CICD pipelines.
This appears to be a known bug in the hyperopt library and there is a fix merged to master but it is not released.
Databricks Runtime for Machine Learning ships with a Databricks fork of hyperopt - version 0.2.7+db1, which has a fix too.
Given that this fork is only available on Databricks Runtimes for Machine Learning, what is the recommended approach to run unit tests on CI/CD infrastructure or local development machines?