Hi @yopbibo ,
Yes, it is possible to use the Python mlflow
package in Databricks without running into the InvalidConfigurationError
you've encountered. The error message suggests that the MLflow CLI needs to be configured before using the mlflow
package.
To avoid this error, you can avoid using the mlflow
CLI and use MLflow tracking via its API. Here is an example of how to log an experiment run in MLflow without the CLI configuration:
import mlflow
# Set the experiment using the experiment ID
mlflow.set_experiment('/Shared/xx')
# Start a new run
with mlflow.start_run() as run:
# Log parameters
mlflow.log_param('learning_rate', 0.01)
mlflow.log_param('max_depth', 5)
# Log metrics
mlflow.log_metric('train_loss', 0.5)
mlflow.log_metric('valid_loss', 1.0)
# Log artifacts
filename = 'model.pkl'
# ... train model and create model artifact ...
with open(filename, 'rb') as f:
mlflow.log_artifact(f, artifact_path=filename)
The `mlflow` package provides several API functions to log data such as metrics, parameters, and artifacts to an experiment run.
To avoid running into `InvalidConfigurationError`, make sure that you have installed the `mlflow` package in your Databricks environment using `%pip install mlflow`.
Once you have configured your experiment run with `mlflow`, you can use the `log_param`, `log_metric`, and `log_artifact` functions to log data to the run. Note that the `log_param` and `log_metric` functions require a name and a value, while the `log_artifact` function requires a file object and an artifact path.
By using the `mlflow` API in your code, you can avoid the need to configure the CLI and directly interact with your experiment runs in MLflow.