<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How to set sparkTrials? I am receiving this TypeError: cannot pickle '_thread.lock' object in Machine Learning</title>
    <link>https://community.databricks.com/t5/machine-learning/how-to-set-sparktrials-i-am-receiving-this-typeerror-cannot/m-p/33777#M1783</link>
    <description>&lt;P&gt;I am trying to distribute hyperparameter tuning using hyperopt on a tensorflow.keras model. I am using sparkTrials in my fmin:&lt;/P&gt;&lt;P&gt;spark_trials = SparkTrials(parallelism=4)&lt;/P&gt;&lt;P&gt;...&lt;/P&gt;&lt;P&gt;best_hyperparam = fmin(fn=CNN_HOF,&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;space=space,&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;algo=tpe.suggest,&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;max_evals=tuner_max_evals,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;trials=spark_trials)&lt;/P&gt;&lt;P&gt;but I am receiving this error:&lt;/P&gt;&lt;P&gt;TypeError: cannot pickle '_thread.lock' object&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;the only way the code is working is skipping the trials passing by commenting out the line trials=spark_trials which means there would be no distributed tuning.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Any idea how can I fix this?&lt;/P&gt;&lt;P&gt;@Tian Tan​&amp;nbsp;&lt;/P&gt;&lt;P&gt;@Sara Dooley​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 23 Aug 2022 17:45:05 GMT</pubDate>
    <dc:creator>Somi</dc:creator>
    <dc:date>2022-08-23T17:45:05Z</dc:date>
    <item>
      <title>How to set sparkTrials? I am receiving this TypeError: cannot pickle '_thread.lock' object</title>
      <link>https://community.databricks.com/t5/machine-learning/how-to-set-sparktrials-i-am-receiving-this-typeerror-cannot/m-p/33777#M1783</link>
      <description>&lt;P&gt;I am trying to distribute hyperparameter tuning using hyperopt on a tensorflow.keras model. I am using sparkTrials in my fmin:&lt;/P&gt;&lt;P&gt;spark_trials = SparkTrials(parallelism=4)&lt;/P&gt;&lt;P&gt;...&lt;/P&gt;&lt;P&gt;best_hyperparam = fmin(fn=CNN_HOF,&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;space=space,&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;algo=tpe.suggest,&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;max_evals=tuner_max_evals,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;trials=spark_trials)&lt;/P&gt;&lt;P&gt;but I am receiving this error:&lt;/P&gt;&lt;P&gt;TypeError: cannot pickle '_thread.lock' object&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;the only way the code is working is skipping the trials passing by commenting out the line trials=spark_trials which means there would be no distributed tuning.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Any idea how can I fix this?&lt;/P&gt;&lt;P&gt;@Tian Tan​&amp;nbsp;&lt;/P&gt;&lt;P&gt;@Sara Dooley​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 23 Aug 2022 17:45:05 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/how-to-set-sparktrials-i-am-receiving-this-typeerror-cannot/m-p/33777#M1783</guid>
      <dc:creator>Somi</dc:creator>
      <dc:date>2022-08-23T17:45:05Z</dc:date>
    </item>
    <item>
      <title>Re: How to set sparkTrials? I am receiving this TypeError: cannot pickle '_thread.lock' object</title>
      <link>https://community.databricks.com/t5/machine-learning/how-to-set-sparktrials-i-am-receiving-this-typeerror-cannot/m-p/33778#M1784</link>
      <description>&lt;P&gt;This can happen when you try to serialize a keras model with an unserializable layer. What does your model look like? Also what is in that search space variable? What are you trying to optimize on?&lt;/P&gt;</description>
      <pubDate>Fri, 26 Aug 2022 21:48:11 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/how-to-set-sparktrials-i-am-receiving-this-typeerror-cannot/m-p/33778#M1784</guid>
      <dc:creator>Dooley</dc:creator>
      <dc:date>2022-08-26T21:48:11Z</dc:date>
    </item>
    <item>
      <title>Re: How to set sparkTrials? I am receiving this TypeError: cannot pickle '_thread.lock' object</title>
      <link>https://community.databricks.com/t5/machine-learning/how-to-set-sparktrials-i-am-receiving-this-typeerror-cannot/m-p/33779#M1785</link>
      <description>&lt;P&gt;This is more code and details:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;space = {
      "pool_1": hp.choice('pool_1',np.arange(2, 5,1, dtype=int)),
      "conv_1": hp.choice('conv_1', np.arange(16, 128, 16,dtype=int)),
      "conv_1b": hp.choice('conv_1b', np.arange(2, 5, 1,dtype=int)),
&amp;nbsp;
      "pool_2": hp.choice('pool_2',np.arange(2, 5,1, dtype=int)), 
      "reg_2" : hp.choice('reg_2', np.arange(0.00005, 0.01, 0.00001, dtype=float)), 
      "conv_2": hp.choice("conv_2", np.arange(16, 128, 16, dtype=int)),
      "conv_2b": hp.choice("conv_2b", np.arange(2, 5, 1, dtype=int)),
&amp;nbsp;
      "pool_3": hp.choice('pool_3', np.arange(2, 5, 1, dtype=int)),  
      "reg_3" : hp.choice('reg_3', np.arange(0.00005, 0.01, 0.00001, dtype=float)), 
      "conv_3": hp.choice("conv_3", np.arange(16, 128, 16, dtype=int)),
      "conv_3b": hp.choice("conv_3b", np.arange(2, 5, 1, dtype=int)),
&amp;nbsp;
      "pool_4" : hp.choice('pool_4', np.arange(2, 5, 1, dtype=int)),   
      "reg_4" : hp.choice('reg_4', np.arange(0.00005, 0.01, 0.00001, dtype=float)),
      "conv_4": hp.choice("conv_4", np.arange(16, 128, 16, dtype=int)),
      "conv_4b": hp.choice("conv_4b",np.arange(2, 5, 1, dtype=int)),
      "drop_4": hp.choice('drop_4', np.arange(0.00005, 0.01, 0.00001, dtype=float)) 
    }&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;def model_builder(params,dense_size): #CNN builder function
    model = Sequential()
    model.add(Conv2D(int(params['conv_1']), (int(params['conv_1b']), int(params['conv_1b'])), activation='relu', input_shape=(150, 150, 3)))
    model.add(MaxPooling2D(int(params['pool_1']), int(params['pool_1'])))
    
    model.add(Conv2D(int(params['conv_2']), (int(params['conv_2b']), int(params['conv_2b'])), activation='relu', kernel_regularizer=L1L2(float(params['reg_2']), float(params['reg_2']))))
    model.add(MaxPooling2D(int(params['pool_2']), int(params['pool_2'])))
    
    model.add(Conv2D(int(params['conv_3']), (int(params['conv_3b']), int(params['conv_3b'])), activation='relu', kernel_regularizer=L1L2(float(params['reg_3']), float(params['reg_3']))))
    model.add(MaxPooling2D(int(params['pool_3']), int(params['pool_3'])))
    
    model.add(Conv2D(int(params['conv_4']), (int(params['conv_4b']), int(params['conv_4b'])), activation='relu', kernel_regularizer=L1L2(float(params['reg_4']), float(params['reg_4']))))
    model.add(MaxPooling2D(int(params['pool_4']), int(params['pool_4'])))
    
    model.add(Dropout(float(params['drop_4'])))
    
    model.add(Flatten())
    model.add(Dense(512, activation='relu'))
    model.add(Dense(dense_size, activation='softmax'))
&amp;nbsp;
    return model&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;def CNN_HOF(params): #Hyperopt objective function
    mlflow.tensorflow.autolog()
    model = model_builder(params,dense_size)
    model.compile(loss="categorical_crossentropy",
                optimizer=Adam(),
                metrics=["accuracy"])
&amp;nbsp;
    history = model.fit(train_generator,
                        steps_per_epoch=train_step,
                        epochs=tuner_epochs,
                        validation_data=valid_generator,
                        validation_steps=valid_step,
                        verbose=2)
  # Evaluate the model
    score = model.evaluate(test_generator, steps=1, verbose=0)
    obj_metric = score[0]
    return {"loss": obj_metric, "status": STATUS_OK}
&amp;nbsp;
&amp;nbsp;
spark_trials = SparkTrials(parallelism=4)
...
with mlflow.start_run(run_name=model_name+"_Tuning"):
        best_hyperparam = fmin(fn=CNN_HOF,
                                 space=space,
                                 algo=tpe.suggest,
                                 max_evals=tuner_max_evals,
                                 early_stop_fn=no_progress_loss(10),
                                    trials=spark_trials)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;B&gt;This is the complete error message:&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;TypeError                                 Traceback (most recent call last)&lt;/P&gt;&lt;P&gt;&amp;lt;command-2155252138731800&amp;gt; in &amp;lt;module&amp;gt;&lt;/P&gt;&lt;P&gt;      1 if tuning:&lt;/P&gt;&lt;P&gt;----&amp;gt; 2     Hyperparameter_tuning(model_name)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;lt;command-3238776031025884&amp;gt; in Hyperparameter_tuning(model_name)&lt;/P&gt;&lt;P&gt;      2     with mlflow.start_run(run_name=model_name+"_Tuning"):&lt;/P&gt;&lt;P&gt;      3 #         mlflow.tensorflow.autolog()&lt;/P&gt;&lt;P&gt;----&amp;gt; 4         best_hyperparam = fmin(fn=CNN_HOF,&lt;/P&gt;&lt;P&gt;      5                                  space=space,&lt;/P&gt;&lt;P&gt;      6                                  algo=tpe.suggest,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;/databricks/.python_edge_libs/hyperopt/fmin.py in fmin(fn, space, algo, max_evals, timeout, loss_threshold, trials, rstate, allow_trials_fmin, pass_expr_memo_ctrl, catch_eval_exceptions, verbose, return_argmin, points_to_evaluate, max_queue_len, show_progressbar, early_stop_fn, trials_save_file)&lt;/P&gt;&lt;P&gt;    563&lt;/P&gt;&lt;P&gt;    564     if allow_trials_fmin and hasattr(trials, "fmin"):&lt;/P&gt;&lt;P&gt;--&amp;gt; 565         return trials.fmin(&lt;/P&gt;&lt;P&gt;    566             fn,&lt;/P&gt;&lt;P&gt;    567             space,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;/databricks/.python_edge_libs/hyperopt/instrumentation.py in instrumented(func, self, args, kwargs)&lt;/P&gt;&lt;P&gt;     25     )&lt;/P&gt;&lt;P&gt;     26     try:&lt;/P&gt;&lt;P&gt;---&amp;gt; 27         return_val = func(*args, **kwargs)&lt;/P&gt;&lt;P&gt;     28     except Exception as exc:&lt;/P&gt;&lt;P&gt;     29         error_string = "{} with message: {}".format(type(exc).__name__, str(exc))&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;/databricks/.python_edge_libs/hyperopt/spark.py in fmin(self, fn, space, algo, max_evals, timeout, loss_threshold, max_queue_len, rstate, verbose, pass_expr_memo_ctrl, catch_eval_exceptions, return_argmin, show_progressbar, early_stop_fn, trials_save_file)&lt;/P&gt;&lt;P&gt;    311         except BaseException as e:&lt;/P&gt;&lt;P&gt;    312             logger.debug("fmin thread exits with an exception raised.")&lt;/P&gt;&lt;P&gt;--&amp;gt; 313             raise e&lt;/P&gt;&lt;P&gt;    314         else:&lt;/P&gt;&lt;P&gt;    315             logger.debug("fmin thread exits normally.")&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;/databricks/.python_edge_libs/hyperopt/spark.py in fmin(self, fn, space, algo, max_evals, timeout, loss_threshold, max_queue_len, rstate, verbose, pass_expr_memo_ctrl, catch_eval_exceptions, return_argmin, show_progressbar, early_stop_fn, trials_save_file)&lt;/P&gt;&lt;P&gt;    283             )&lt;/P&gt;&lt;P&gt;    284&lt;/P&gt;&lt;P&gt;--&amp;gt; 285             res = fmin(&lt;/P&gt;&lt;P&gt;    286                 fn,&lt;/P&gt;&lt;P&gt;    287                 space,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;/databricks/.python_edge_libs/hyperopt/fmin.py in fmin(fn, space, algo, max_evals, timeout, loss_threshold, trials, rstate, allow_trials_fmin, pass_expr_memo_ctrl, catch_eval_exceptions, verbose, return_argmin, points_to_evaluate, max_queue_len, show_progressbar, early_stop_fn, trials_save_file)&lt;/P&gt;&lt;P&gt;    592     domain = base.Domain(fn, space, pass_expr_memo_ctrl=pass_expr_memo_ctrl)&lt;/P&gt;&lt;P&gt;    593&lt;/P&gt;&lt;P&gt;--&amp;gt; 594     rval = FMinIter(&lt;/P&gt;&lt;P&gt;    595         algo,&lt;/P&gt;&lt;P&gt;    596         domain,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;/databricks/.python_edge_libs/hyperopt/fmin.py in __init__(self, algo, domain, trials, rstate, asynchronous, max_queue_len, poll_interval_secs, max_evals, timeout, loss_threshold, verbose, show_progressbar, early_stop_fn, trials_save_file)&lt;/P&gt;&lt;P&gt;    180                     )&lt;/P&gt;&lt;P&gt;    181                 else:&lt;/P&gt;&lt;P&gt;--&amp;gt; 182                     raise e&lt;/P&gt;&lt;P&gt;    183             trials.attachments["FMinIter_Domain"] = msg&lt;/P&gt;&lt;P&gt;    184&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;/databricks/.python_edge_libs/hyperopt/fmin.py in __init__(self, algo, domain, trials, rstate, asynchronous, max_queue_len, poll_interval_secs, max_evals, timeout, loss_threshold, verbose, show_progressbar, early_stop_fn, trials_save_file)&lt;/P&gt;&lt;P&gt;    163                 logger.warning("over-writing old domain trials attachment")&lt;/P&gt;&lt;P&gt;    164             try:&lt;/P&gt;&lt;P&gt;--&amp;gt; 165                 msg = pickler.dumps(domain)&lt;/P&gt;&lt;P&gt;    166             except TypeError as e:&lt;/P&gt;&lt;P&gt;    167                 if "cannot pickle '_thread.RLock' object" in str(e):&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;/databricks/python/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py in dumps(obj, protocol, buffer_callback)&lt;/P&gt;&lt;P&gt;     71                 file, protocol=protocol, buffer_callback=buffer_callback&lt;/P&gt;&lt;P&gt;     72             )&lt;/P&gt;&lt;P&gt;---&amp;gt; 73             cp.dump(obj)&lt;/P&gt;&lt;P&gt;     74             return file.getvalue()&lt;/P&gt;&lt;P&gt;     75&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;/databricks/python/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py in dump(self, obj)&lt;/P&gt;&lt;P&gt;    561     def dump(self, obj):&lt;/P&gt;&lt;P&gt;    562         try:&lt;/P&gt;&lt;P&gt;--&amp;gt; 563             return Pickler.dump(self, obj)&lt;/P&gt;&lt;P&gt;    564         except RuntimeError as e:&lt;/P&gt;&lt;P&gt;    565             if "recursion" in e.args[0]:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;TypeError: cannot pickle '_thread.lock' object&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 26 Aug 2022 21:59:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/how-to-set-sparktrials-i-am-receiving-this-typeerror-cannot/m-p/33779#M1785</guid>
      <dc:creator>Somi</dc:creator>
      <dc:date>2022-08-26T21:59:43Z</dc:date>
    </item>
    <item>
      <title>Re: How to set sparkTrials? I am receiving this TypeError: cannot pickle '_thread.lock' object</title>
      <link>https://community.databricks.com/t5/machine-learning/how-to-set-sparktrials-i-am-receiving-this-typeerror-cannot/m-p/33780#M1786</link>
      <description>&lt;P&gt;This is a summary of the cluster:&lt;/P&gt;&lt;P&gt;2-8 Workers&lt;/P&gt;&lt;P&gt;     32-128&amp;nbsp;GB Memory&lt;/P&gt;&lt;P&gt;     8-32&amp;nbsp;Cores&lt;/P&gt;&lt;P&gt;1 Driver&lt;/P&gt;&lt;P&gt;     16&amp;nbsp;GB Memory,&amp;nbsp;4&amp;nbsp;Cores&lt;/P&gt;&lt;P&gt;Runtime&lt;/P&gt;&lt;P&gt;     10.5.x-gpu-ml-scala2.12&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;(3) what is the value set for tuner_max_evals? Right now it's been set to 36, but with every value of this, we receive the error. &lt;/P&gt;&lt;P&gt;    &lt;/P&gt;</description>
      <pubDate>Tue, 30 Aug 2022 17:23:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/how-to-set-sparktrials-i-am-receiving-this-typeerror-cannot/m-p/33780#M1786</guid>
      <dc:creator>Somi</dc:creator>
      <dc:date>2022-08-30T17:23:39Z</dc:date>
    </item>
    <item>
      <title>Re: How to set sparkTrials? I am receiving this TypeError: cannot pickle '_thread.lock' object</title>
      <link>https://community.databricks.com/t5/machine-learning/how-to-set-sparktrials-i-am-receiving-this-typeerror-cannot/m-p/33781#M1787</link>
      <description>&lt;P&gt;From looking through your full error, it looks like the error is first with FMinIter before it tries to do a Pickler dump after that error which is what is throwing that specific error. I think the true error we are chasing here is the FMinIter error. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;First, try to test the search space on something simpler (maybe only trying it only pool for one layer at first). Additionally, I would use hp.quniform with min &amp;gt;&amp;gt;= 1 such as this:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;search_space = {
    "pool_1": hp.quniform("pool_1", 2, 5, 1),
}&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Another thought is that you need to load your train_generator &amp;amp; test_generator in the objective function which I'm assuming could look like this:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;train_generator = datagen.flow_from_directory(directory="/image/path", 
                                              class_mode="binary", 
                                              classes=["normal", "abnormal"],
                                              batch_size=batch_size,
                                              target_size=(img_height, img_width))&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 30 Aug 2022 17:59:54 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/how-to-set-sparktrials-i-am-receiving-this-typeerror-cannot/m-p/33781#M1787</guid>
      <dc:creator>Dooley</dc:creator>
      <dc:date>2022-08-30T17:59:54Z</dc:date>
    </item>
    <item>
      <title>Re: How to set sparkTrials? I am receiving this TypeError: cannot pickle '_thread.lock' object</title>
      <link>https://community.databricks.com/t5/machine-learning/how-to-set-sparktrials-i-am-receiving-this-typeerror-cannot/m-p/33782#M1788</link>
      <description>&lt;P&gt;Lastly, the output of the objective function might not be serializable by pickle. Double check that it is for sure a double that you are returning and not something more complex.&lt;/P&gt;</description>
      <pubDate>Tue, 30 Aug 2022 18:02:50 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/how-to-set-sparktrials-i-am-receiving-this-typeerror-cannot/m-p/33782#M1788</guid>
      <dc:creator>Dooley</dc:creator>
      <dc:date>2022-08-30T18:02:50Z</dc:date>
    </item>
    <item>
      <title>Re: How to set sparkTrials? I am receiving this TypeError: cannot pickle '_thread.lock' object</title>
      <link>https://community.databricks.com/t5/machine-learning/how-to-set-sparktrials-i-am-receiving-this-typeerror-cannot/m-p/33783#M1789</link>
      <description>&lt;P&gt;I changed the code in the way you were suggesting. It was not working with sparktrials(). It only worked with Trials() which I believe means there is no distributed tuning as we don't call distributed training algorithms such as MLlib or Horovod and we only call single-machine algorithms.&lt;/P&gt;&lt;P&gt;How we can make sure if using Trials means distributed tuning or not?&lt;/P&gt;&lt;P&gt;You can take a look at the notebook at this address:&lt;/P&gt;&lt;P&gt;&lt;A href="https://dbc-1dfc249d-eec7.cloud.databricks.com/?o=3298945606027707#notebook/1496814655941658/command/1496814655941666" target="test_blank"&gt;https://dbc-1dfc249d-eec7.cloud.databricks.com/?o=3298945606027707#notebook/1496814655941658/command/1496814655941666&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 30 Aug 2022 22:27:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/how-to-set-sparktrials-i-am-receiving-this-typeerror-cannot/m-p/33783#M1789</guid>
      <dc:creator>Somi</dc:creator>
      <dc:date>2022-08-30T22:27:36Z</dc:date>
    </item>
    <item>
      <title>Re: How to set sparkTrials? I am receiving this TypeError: cannot pickle '_thread.lock' object</title>
      <link>https://community.databricks.com/t5/machine-learning/how-to-set-sparktrials-i-am-receiving-this-typeerror-cannot/m-p/33784#M1790</link>
      <description>&lt;P&gt;Sorry, I do not have access authority to your workspace. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;So you had in the objective function that it just returned a simple double - the accuracy - and it threw the exact same error? Then the question I have is what does the generator look like?&lt;/P&gt;</description>
      <pubDate>Wed, 31 Aug 2022 16:01:05 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/how-to-set-sparktrials-i-am-receiving-this-typeerror-cannot/m-p/33784#M1790</guid>
      <dc:creator>Dooley</dc:creator>
      <dc:date>2022-08-31T16:01:05Z</dc:date>
    </item>
    <item>
      <title>Re: How to set sparkTrials? I am receiving this TypeError: cannot pickle '_thread.lock' object</title>
      <link>https://community.databricks.com/t5/machine-learning/how-to-set-sparktrials-i-am-receiving-this-typeerror-cannot/m-p/33785#M1791</link>
      <description>&lt;P&gt;Using sparktrials I am receiving this error not the same error I was receiving before:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;BadObjectiveFunction: When using `fmin` asynchronously, distributed algorithms or distributed objects may not be used within the objective function. This includes algorithms from Apache Spark ML and data objects like Spark DataFrames. In order to use Apache Spark in the objective function, use `Trials` instead of `SparkTrials`. To instead use `fmin` for single-machine ML like scikit-learn, make sure the objective function does not reference a Spark DataFrame or a distributed algorithm. See the following docs for more details on using Spark with Hyperopt: &lt;A href="https://hyperopt.github.io/hyperopt/scaleout/spark" target="test_blank"&gt;https://hyperopt.github.io/hyperopt/scaleout/spark&lt;/A&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;&amp;nbsp;
TypeError                                 Traceback (most recent call last)
/databricks/.python_edge_libs/hyperopt/fmin.py in __init__(self, algo, domain, trials, rstate, asynchronous, max_queue_len, poll_interval_secs, max_evals, timeout, loss_threshold, verbose, show_progressbar, early_stop_fn, trials_save_file)
    164             try:
--&amp;gt; 165                 msg = pickler.dumps(domain)
    166             except TypeError as e:
&amp;nbsp;
/databricks/python/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py in dumps(obj, protocol, buffer_callback)
     72             )
---&amp;gt; 73             cp.dump(obj)
     74             return file.getvalue()
&amp;nbsp;
/databricks/python/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py in dump(self, obj)
    562         try:
--&amp;gt; 563             return Pickler.dump(self, obj)
    564         except RuntimeError as e:
&amp;nbsp;
TypeError: cannot pickle '_thread.RLock' object
&amp;nbsp;
During handling of the above exception, another exception occurred:
&amp;nbsp;
BadObjectiveFunction                      Traceback (most recent call last)
&amp;lt;command-1496814655941666&amp;gt; in &amp;lt;module&amp;gt;
----&amp;gt; 1 Hyperparameter_tuning(model_name)
&amp;nbsp;
&amp;lt;command-1496814655941665&amp;gt; in Hyperparameter_tuning(model_name)
      2     with mlflow.start_run(run_name=model_name+"_Tuning"):
      3 #         mlflow.tensorflow.autolog()
----&amp;gt; 4         best_hyperparam = fmin(fn=CNN_HOF, 
      5                                  space=space,
      6                                  algo=tpe.suggest,
&amp;nbsp;
/databricks/.python_edge_libs/hyperopt/fmin.py in fmin(fn, space, algo, max_evals, timeout, loss_threshold, trials, rstate, allow_trials_fmin, pass_expr_memo_ctrl, catch_eval_exceptions, verbose, return_argmin, points_to_evaluate, max_queue_len, show_progressbar, early_stop_fn, trials_save_file)
    563 
    564     if allow_trials_fmin and hasattr(trials, "fmin"):
--&amp;gt; 565         return trials.fmin(
    566             fn,
    567             space,
&amp;nbsp;
/databricks/.python_edge_libs/hyperopt/instrumentation.py in instrumented(func, self, args, kwargs)
     25     )
     26     try:
---&amp;gt; 27         return_val = func(*args, **kwargs)
     28     except Exception as exc:
     29         error_string = "{} with message: {}".format(type(exc).__name__, str(exc))
&amp;nbsp;
/databricks/.python_edge_libs/hyperopt/spark.py in fmin(self, fn, space, algo, max_evals, timeout, loss_threshold, max_queue_len, rstate, verbose, pass_expr_memo_ctrl, catch_eval_exceptions, return_argmin, show_progressbar, early_stop_fn, trials_save_file)
    311         except BaseException as e:
    312             logger.debug("fmin thread exits with an exception raised.")
--&amp;gt; 313             raise e
    314         else:
    315             logger.debug("fmin thread exits normally.")
&amp;nbsp;
/databricks/.python_edge_libs/hyperopt/spark.py in fmin(self, fn, space, algo, max_evals, timeout, loss_threshold, max_queue_len, rstate, verbose, pass_expr_memo_ctrl, catch_eval_exceptions, return_argmin, show_progressbar, early_stop_fn, trials_save_file)
    283             )
    284 
--&amp;gt; 285             res = fmin(
    286                 fn,
    287                 space,
&amp;nbsp;
/databricks/.python_edge_libs/hyperopt/fmin.py in fmin(fn, space, algo, max_evals, timeout, loss_threshold, trials, rstate, allow_trials_fmin, pass_expr_memo_ctrl, catch_eval_exceptions, verbose, return_argmin, points_to_evaluate, max_queue_len, show_progressbar, early_stop_fn, trials_save_file)
    592     domain = base.Domain(fn, space, pass_expr_memo_ctrl=pass_expr_memo_ctrl)
    593 
--&amp;gt; 594     rval = FMinIter(
    595         algo,
    596         domain,
&amp;nbsp;
/databricks/.python_edge_libs/hyperopt/fmin.py in __init__(self, algo, domain, trials, rstate, asynchronous, max_queue_len, poll_interval_secs, max_evals, timeout, loss_threshold, verbose, show_progressbar, early_stop_fn, trials_save_file)
    166             except TypeError as e:
    167                 if "cannot pickle '_thread.RLock' object" in str(e):
--&amp;gt; 168                     raise BadObjectiveFunction(
    169                         "When using `fmin` asynchronously, distributed algorithms or "
    170                         "distributed objects may not be used within the objective function. "&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;When turning it to `Trials`, it is working but I doubt if it is distributed.&lt;/P&gt;&lt;P&gt;Image generator looks like this:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;def img_generator(train_df,valid_df,test_df):
    train_df_count = train_df.count()
    result= json.loads(dbutils.notebook.run("Batch_step_size", 3600,{"dataframe_count":train_df_count }))
    train_batch=result['batch']
    train_step=result['step']
        
    img_prep_function=None
&amp;nbsp;
    if image_augmentation:
        train_data_gen = ImageDataGenerator(rescale=1.0/255,
                                            rotation_range=40,
                                            width_shift_range=0.2,
                                            height_shift_range=0.2,
                                            shear_range=2.0,
                                            zoom_range=0.2,
                                            horizontal_flip=True,
                                            fill_mode='nearest',
                                            preprocessing_function=img_prep_function)
    else:
        train_data_gen = ImageDataGenerator(rescale=1.0/255, preprocessing_function=img_prep_function)
&amp;nbsp;
&amp;nbsp;
    train_generator = train_data_gen.flow_from_dataframe(dataframe=train_df.toPandas(),
                                                         directory=images_dir,
                                                         x_col='filename',
                                                         y_col=target,
                                                         target_size=(150, 150),
                                                         class_mode='categorical',
                                                         batch_size=train_batch)
    valid_df_count = valid_df.count()
    result= json.loads(dbutils.notebook.run("Batch_step_size", 3600,{"dataframe_count":valid_df_count }))
    valid_batch=result['batch']
    valid_step=result['step']
&amp;nbsp;
    valid_data_gen = ImageDataGenerator(rescale=1.0/255, preprocessing_function=img_prep_function)
    valid_generator = valid_data_gen.flow_from_dataframe(dataframe=valid_df.toPandas(),
                                                         directory=images_dir,
                                                         x_col='filename',
                                                         y_col=target,
                                                         target_size=(150, 150),
                                                         class_mode='categorical',
                                                         batch_size=valid_batch,
                                                         shuffle=False,
                                                         seed=42)
&amp;nbsp;
&amp;nbsp;
&amp;nbsp;
    test_df_count = test_df.count()
    result= json.loads(dbutils.notebook.run("Batch_step_size", 3600,{"dataframe_count":test_df_count }))
    test_batch=result['batch']
    test_step=result['step']
&amp;nbsp;
&amp;nbsp;
    test_data_gen = ImageDataGenerator(rescale=1.0/255, preprocessing_function=img_prep_function)
    test_generator = test_data_gen.flow_from_dataframe(dataframe=test_df.toPandas(),
                                                           directory=images_dir,
                                                           x_col='filename',
                                                           y_col=target,
                                                           target_size=(150, 150),
                                                           class_mode='categorical',
                                                           batch_size=test_batch,
                                                           shuffle=False,
                                                           seed=42)
    return train_generator,train_step,train_batch,valid_generator,valid_step,test_generator,test_step&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 31 Aug 2022 17:21:49 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/how-to-set-sparktrials-i-am-receiving-this-typeerror-cannot/m-p/33785#M1791</guid>
      <dc:creator>Somi</dc:creator>
      <dc:date>2022-08-31T17:21:49Z</dc:date>
    </item>
    <item>
      <title>Re: How to set sparkTrials? I am receiving this TypeError: cannot pickle '_thread.lock' object</title>
      <link>https://community.databricks.com/t5/machine-learning/how-to-set-sparktrials-i-am-receiving-this-typeerror-cannot/m-p/33786#M1792</link>
      <description>&lt;P&gt;Try the below: &lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;def CNN_HOF(train_df_pd, valid_df_pd, test_df_pd, params): #Hyperopt objective function
    train_generator = train_data_gen.flow_from_dataframe(dataframe=train_df_pd,
                                                         directory=images_dir,
                                                         x_col='filename',
                                                         y_col=target,
                                                         target_size=(150, 150),
                                                         class_mode='categorical',
                                                         batch_size=train_batch)
     valid_generator = valid_data_gen.flow_from_dataframe(dataframe=valid_df_pd,
                                                         directory=images_dir,
                                                         x_col='filename',
                                                         y_col=target,
                                                         target_size=(150, 150),
                                                         class_mode='categorical',
                                                         batch_size=valid_batch,
                                                         shuffle=False,
                                                         seed=42)
    test_generator = test_data_gen.flow_from_dataframe(dataframe=test_df_pd,
                                                           directory=images_dir,
                                                           x_col='filename',
                                                           y_col=target,
                                                           target_size=(150, 150),
                                                           class_mode='categorical',
                                                           batch_size=test_batch,
                                                           shuffle=False,
                                                           seed=42)
    mlflow.tensorflow.autolog()
    model = model_builder(params,dense_size)
    model.compile(loss="categorical_crossentropy",
                optimizer=Adam(),
                metrics=["accuracy"])
 
    history = model.fit(train_generator,
                        steps_per_epoch=train_step,
                        epochs=tuner_epochs,
                        validation_data=valid_generator,
                        validation_steps=valid_step,
                        verbose=2)
  # Evaluate the model
    score = model.evaluate(test_generator, steps=1, verbose=0)
    obj_metric = score[0]
    return float(obj_metric)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;I'm assuming your dense_size, valid_batch, train_batch, test_batch, image_dir, and target are global variables. Note that train_df_pd = train_df.toPandas(), valid_df_pd = valid_df.toPandas(), and test_df_pd = test_df.toPandas() are outside of the objective function and the pandas dataframes are being brought in as arguments. Then you put the generator in the objective function in here. If this works, then the generator was the issue and we can do something to speed up this process. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I also want to note that I am returning the score as a float. I'm wondering if that score[0] is a dictionary and not a float but I casted it but you can print that type to validate.&lt;/P&gt;</description>
      <pubDate>Thu, 01 Sep 2022 16:53:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/how-to-set-sparktrials-i-am-receiving-this-typeerror-cannot/m-p/33786#M1792</guid>
      <dc:creator>Dooley</dc:creator>
      <dc:date>2022-09-01T16:53:43Z</dc:date>
    </item>
    <item>
      <title>Re: How to set sparkTrials? I am receiving this TypeError: cannot pickle '_thread.lock' object</title>
      <link>https://community.databricks.com/t5/machine-learning/how-to-set-sparktrials-i-am-receiving-this-typeerror-cannot/m-p/33787#M1793</link>
      <description>&lt;P&gt;Yes, all those are global variables. &lt;/P&gt;&lt;P&gt;This code was not working for me, but it gave me the clue. I changed spark dataframes to pandas outside of the image generators and then I moved Image generators inside of the objective function. This way there was no need to pass the dataframes as arguments to the objective function. It is now working with sparktrials parallelism &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;&lt;P&gt; As for the score, it is a list of scalars and score[0] is our test_loss which is a float. What did you receive when you cast it?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 02 Sep 2022 20:11:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/how-to-set-sparktrials-i-am-receiving-this-typeerror-cannot/m-p/33787#M1793</guid>
      <dc:creator>Somi</dc:creator>
      <dc:date>2022-09-02T20:11:43Z</dc:date>
    </item>
  </channel>
</rss>

