Hi! Any help will be greatly appreciated!!
So I'm following this tutorial: https://docs.databricks.com/applications/mlflow/projects.html.
I decided to use a folder in DBFS that contains my MLflow Project details. And so, in my project I have:
MLproject:
conda_env: /dbfs/FileStore/shared_uploads/mcai2@optumcloud.com/wineTest/conda.yaml
entry_points:
main:
parameters:
n_estimators: {type: int, default: 0.5}
command: "python3 /dbfs/FileStore/shared_uploads/mcai2@optumcloud.com/wineTest/train.py {n_estimators}"
conda.yaml
name: mlflow-env
channels:
- conda-forge
dependencies:
- python=3.8.10
- pip
- pip:
- mlflow
- pandas==1.2.4
- psutil==5.8.0
- scikit-learn==0.24.1
- typing-extensions==3.7.4.3
- xgboost==1.5.2
train.py (which is code taken from this notebook: https://docs.databricks.com/_static/notebooks/mlflow/mlflow-end-to-end-example.html
[I basically copied the cells into a .py file. Sorry, the code is too long to include here.])
Then for my cluster specification, I have this code:
{
"new_cluster": {
"spark_version": "9.1.x-cpu-ml-scala2.12",
"num_workers": 2,
"node_type_id": "Standard_DS3_v2"
},
"libraries": [
{
"pypi": {
"package": "/dbfs/FileStore/shared_uploads/mcai2@optumcloud.com/requests-2.28.1-py3-none-any.whl"
}
},
.... (47 other packages of same format)
And so I get to step 3 in the tutorial, and I end up getting this:
and when I go to the experiment I made in step 1, there is nothing like what is outlined in step 4.
I'm sure my mistake is somewhere in my folder containing the parts to the project, but I'm not sure what I'm doing wrong. I'm a newcomer to databricks and to writing code like this. Any help would be so greatly appreciated. Thank you so much for your time and help!