Databricks

bluetail · ‎01-16-2022

I am running a notebook on the Coursera platform.

my configuration file, Classroom-Setup, looks like this:

%python
 
spark.conf.set("com.databricks.training.module-name", "deep-learning")
spark.conf.set("com.databricks.training.expected-dbr", "6.4")
 
spark.conf.set("com.databricks.training.suppress.untilStreamIsReady", "true")
spark.conf.set("com.databricks.training.suppress.stopAllStreams", "true")
spark.conf.set("com.databricks.training.suppress.moduleName", "true")
spark.conf.set("com.databricks.training.suppress.lessonName", "true")
# spark.conf.set("com.databricks.training.suppress.username", "true")
spark.conf.set("com.databricks.training.suppress.userhome", "true")
# spark.conf.set("com.databricks.training.suppress.workingDir", "true")
spark.conf.set("com.databricks.training.suppress.databaseName", "true")
 
import warnings
warnings.filterwarnings("ignore")
 
#import tensorflow
 
def display_run_uri(experiment_id, run_id):
    host_name = dbutils.notebook.entry_point.getDbutils().notebook().getContext().tags().get("browserHostName").get()
    uri = "https://{}/#mlflow/experiments/{}/runs/{}".format(host_name,experiment_id,run_id)
    displayHTML("""<b>Run URI:</b> <a href="{}">{}</a>""".format(uri,uri))
 
def waitForMLflow():
  try:
    import mlflow; 
    if int(mlflow.__version__.split(".")[1]) >= 2:
        print("""The module "mlflow" is attached and ready to go.""");
    else:
        print("""You need MLflow version 1.2.0+ installed.""")
  except ModuleNotFoundError:
    print("""The module "mlflow" is not yet attached to the cluster, waiting...""");
    while True:
      try: import mlflow; print("""The module "mlflow" is attached and ready to go."""); break;
      except ModuleNotFoundError: import time; time.sleep(1); print(".", end="");
 
 
from sklearn.metrics import confusion_matrix,f1_score,accuracy_score,fbeta_score,precision_score,recall_score
import matplotlib.pyplot as plt
import numpy as np
from sklearn.utils.multiclass import unique_labels
 
def plot_confusion_matrix(y_true, y_pred, classes,
                          title=None,
                          cmap=plt.cm.Blues):
    # Compute confusion matrix
    cm = confusion_matrix(y_true, y_pred)
    fig, ax = plt.subplots()
    im = ax.imshow(cm, interpolation='nearest', cmap=cmap)
    ax.figure.colorbar(im, ax=ax)
    ax.set(xticks=np.arange(cm.shape[1]),
           yticks=np.arange(cm.shape[0]),
           xticklabels=classes, yticklabels=classes,
           title=title,
           ylabel='True label',
           xlabel='Predicted label')
 
    plt.setp(ax.get_xticklabels(), rotation=45, ha="right",
             rotation_mode="anchor")
 
    fmt = 'd'
    thresh = cm.max() / 2.
    for i in range(cm.shape[0]):
        for j in range(cm.shape[1]):
            ax.text(j, i, format(cm[i, j], fmt),
                    ha="center", va="center",
                    color="white" if cm[i, j] > thresh else "black")
    fig.tight_layout()
    return fig
 
np.set_printoptions(precision=2)
 
displayHTML("Preparing the learning environment...")

I have no issues running this command,

%run "./Includes/Classroom-Setup" , as it says all the functions have been defined.

then when I am running this,

%python

import mlflow

import mlflow.spark

in the next cell, I am getting a ModelNotFoundError:

ModuleNotFoundError                       Traceback (most recent call last)
<command-1419217929106651> in <module>
----> 1 import mlflow
      2 import mlflow.spark
 
/databricks/python_shell/dbruntime/PythonPackageImportsInstrumentation/__init__.py in import_patch(name, globals, locals, fromlist, level)
    156             # Import the desired module. If you’re seeing this while debugging a failed import,
    157             # look at preceding stack frames for relevant error information.
--> 158             original_result = python_builtin_import(name, globals, locals, fromlist, level)
    159 
    160             is_root_import = thread_local._nest_level == 1
 
ModuleNotFoundError: No module named 'mlflow'

What is the cause of this and how can I fix it? Unfortunately, Coursera is not helpful with this particular course.

Thank you, I am new to Databricks.

bluetail · ‎01-18-2022

I have installed manually mlflow==1.20.2 with the 9.1 cluster and it worked 🙂 thank you.

View solution in original post

User16753724663 · ‎01-17-2022

Hi @Maria Bruevich ,

From the error description, it looks like the mlflow library is not present. You can use ML cluster as these type of cluster already have mlflow library. Please check the below document:

https://docs.databricks.com/release-notes/runtime/7.3ml.html

Or else, we will need to install the required library into the existing cluster.

Below document will help to install the library:

https://docs.databricks.com/libraries/cluster-libraries.html

Please let us know if this helps.

bluetail · ‎01-18-2022

Darshan, I am using the 9.1 cluster, is it not a higher version?

I have tried both 9.1 and 7.3 clusters and am still getting the same error.

bluetail · ‎01-18-2022

I have installed manually mlflow==1.20.2 with the 9.1 cluster and it worked 🙂 thank you.

Anonymous · ‎01-19-2022

Should be easier to just use the ML runtimes https://docs.databricks.com/runtime/mlruntime.html

bluetail · ‎01-21-2022

the standard runtimes did not work with me. I am not sure why, I am on a 14 day trial at the moment.

by the way do the 7.3 and 9.1 cost the same to run?

Anonymous · ‎01-21-2022

There is no cost associated with particular runtimes. All the costs are associated with the cluster VM size and how long the cluster runs.

Databricks

ModuleNotFoundError: No module named 'mlflow' when running a notebook

Registration now open! Databricks Data + AI Summit 2024

Meet DBRX, the New Standard for High-Quality LLMs

Data Warehousing in the Era of AI