<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: I have a job with multiple tasks running asynchronously and I don't think its leveraging all the nodes on the cluster based on runtime. in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/i-have-a-job-with-multiple-tasks-running-asynchronously-and-i/m-p/2694#M22</link>
    <description>&lt;P&gt;Hi Debayan, I did notice the number of tasks completed per worker node was different when I looked at the Spark UI -&amp;gt; Executors page. So it does appear the whole cluster was used but what I couldn't tell is if the Driver node was sending out the tasks in parallel across the workers or sequentially assigning. My workflow looks like this:&lt;span class="lia-inline-image-display-wrapper" image-alt="Screenshot 2023-06-23 072337"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/58i01BFF5582FA59E62/image-size/large?v=v2&amp;amp;px=999" role="button" title="Screenshot 2023-06-23 072337" alt="Screenshot 2023-06-23 072337" /&gt;&lt;/span&gt;Previously I ran a single notebooks and executed the m# notebooks in a loop like this:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;for model_number in MODEL_NUMBERS:
  
    global_parameters['MODEL_NUMBER'] = model_number
    print(f"Building {model_number} train/test data...")
    train_test_data = dbutils.notebook.run(
        'build_train_test',
        60 * 60,
        global_parameters)
&amp;nbsp;
    train_test_data = json.loads(train_test_data)
    if (file_exists(train_test_data['TRAIN_DATA']) and file_exists(train_test_data['TEST_DATA'])):
        f"{model_number} train/test data complete."
&amp;nbsp;
        print(f"Training model {model_number}...") 
        trained_model = dbutils.notebook.run(
          'training',
           0,
          global_parameters
        )
&amp;nbsp;
        if file_exists(trained_model):
            evaluation_metrics = dbutils.notebook.run(
            'evaluation',
            60 * 60,
            global_parameters
            )
&amp;nbsp;
            for metric in evaluation_metrics.split(','):
                if file_exists(metric):
                    continue
                else:
                    raise FileNotFoundError("Evaluation Metric Not Found: " + metric)
&amp;nbsp;
        else:
            raise FileNotFoundError("Trained model not found: " + trained_model)
    
    else:
        raise FileNotFoundError("Training and Test Data Not Found: " + train_test_data)
        
    print(f"Building final_model {model_number} train/test data...")
    if ENV in ['test', 'prod']:
        dbutils.notebook.run(
            'final_models',
            0,
            global_parameters
        )&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;The sequential loop job runs in 28 min while the async/parallel job runs in 50 min. &lt;/P&gt;</description>
    <pubDate>Fri, 23 Jun 2023 11:26:29 GMT</pubDate>
    <dc:creator>dave_hiltbrand</dc:creator>
    <dc:date>2023-06-23T11:26:29Z</dc:date>
    <item>
      <title>I have a job with multiple tasks running asynchronously and I don't think its leveraging all the nodes on the cluster based on runtime.</title>
      <link>https://community.databricks.com/t5/data-engineering/i-have-a-job-with-multiple-tasks-running-asynchronously-and-i/m-p/2691#M19</link>
      <description>&lt;P&gt;I have a job with multiple tasks running asynchronously and I don't think its leveraging all the nodes on the cluster based on runtime. I open the Spark UI for the cluster and checkout the executors and don't see any tasks for my worker nodes. How can I&amp;nbsp;monitor the cluster to ensure my tasks are running in parallel and taking advantage of my multiple node cluster?&lt;/P&gt;</description>
      <pubDate>Fri, 23 Jun 2023 02:47:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/i-have-a-job-with-multiple-tasks-running-asynchronously-and-i/m-p/2691#M19</guid>
      <dc:creator>dave_hiltbrand</dc:creator>
      <dc:date>2023-06-23T02:47:26Z</dc:date>
    </item>
    <item>
      <title>Re: I have a job with multiple tasks running asynchronously and I don't think its leveraging all the nodes on the cluster based on runtime.</title>
      <link>https://community.databricks.com/t5/data-engineering/i-have-a-job-with-multiple-tasks-running-asynchronously-and-i/m-p/2693#M21</link>
      <description>&lt;P&gt;Hi @Dave Hiltbrand​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Great to meet you, and thanks for your question!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt; Let's see if your peers in the community have an answer to your question. Thanks.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 23 Jun 2023 07:18:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/i-have-a-job-with-multiple-tasks-running-asynchronously-and-i/m-p/2693#M21</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2023-06-23T07:18:56Z</dc:date>
    </item>
    <item>
      <title>Re: I have a job with multiple tasks running asynchronously and I don't think its leveraging all the nodes on the cluster based on runtime.</title>
      <link>https://community.databricks.com/t5/data-engineering/i-have-a-job-with-multiple-tasks-running-asynchronously-and-i/m-p/2694#M22</link>
      <description>&lt;P&gt;Hi Debayan, I did notice the number of tasks completed per worker node was different when I looked at the Spark UI -&amp;gt; Executors page. So it does appear the whole cluster was used but what I couldn't tell is if the Driver node was sending out the tasks in parallel across the workers or sequentially assigning. My workflow looks like this:&lt;span class="lia-inline-image-display-wrapper" image-alt="Screenshot 2023-06-23 072337"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/58i01BFF5582FA59E62/image-size/large?v=v2&amp;amp;px=999" role="button" title="Screenshot 2023-06-23 072337" alt="Screenshot 2023-06-23 072337" /&gt;&lt;/span&gt;Previously I ran a single notebooks and executed the m# notebooks in a loop like this:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;for model_number in MODEL_NUMBERS:
  
    global_parameters['MODEL_NUMBER'] = model_number
    print(f"Building {model_number} train/test data...")
    train_test_data = dbutils.notebook.run(
        'build_train_test',
        60 * 60,
        global_parameters)
&amp;nbsp;
    train_test_data = json.loads(train_test_data)
    if (file_exists(train_test_data['TRAIN_DATA']) and file_exists(train_test_data['TEST_DATA'])):
        f"{model_number} train/test data complete."
&amp;nbsp;
        print(f"Training model {model_number}...") 
        trained_model = dbutils.notebook.run(
          'training',
           0,
          global_parameters
        )
&amp;nbsp;
        if file_exists(trained_model):
            evaluation_metrics = dbutils.notebook.run(
            'evaluation',
            60 * 60,
            global_parameters
            )
&amp;nbsp;
            for metric in evaluation_metrics.split(','):
                if file_exists(metric):
                    continue
                else:
                    raise FileNotFoundError("Evaluation Metric Not Found: " + metric)
&amp;nbsp;
        else:
            raise FileNotFoundError("Trained model not found: " + trained_model)
    
    else:
        raise FileNotFoundError("Training and Test Data Not Found: " + train_test_data)
        
    print(f"Building final_model {model_number} train/test data...")
    if ENV in ['test', 'prod']:
        dbutils.notebook.run(
            'final_models',
            0,
            global_parameters
        )&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;The sequential loop job runs in 28 min while the async/parallel job runs in 50 min. &lt;/P&gt;</description>
      <pubDate>Fri, 23 Jun 2023 11:26:29 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/i-have-a-job-with-multiple-tasks-running-asynchronously-and-i/m-p/2694#M22</guid>
      <dc:creator>dave_hiltbrand</dc:creator>
      <dc:date>2023-06-23T11:26:29Z</dc:date>
    </item>
    <item>
      <title>Re: I have a job with multiple tasks running asynchronously and I don't think its leveraging all the nodes on the cluster based on runtime.</title>
      <link>https://community.databricks.com/t5/data-engineering/i-have-a-job-with-multiple-tasks-running-asynchronously-and-i/m-p/2692#M20</link>
      <description>&lt;P&gt;Hi, Could you please try to view metrics at the node levels and see if thats what you are expecting? &lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.databricks.com/compute/cluster-metrics.html#view-metrics-at-the-node-level" alt="https://docs.databricks.com/compute/cluster-metrics.html#view-metrics-at-the-node-level" target="_blank"&gt;https://docs.databricks.com/compute/cluster-metrics.html#view-metrics-at-the-node-level&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Please tag @Debayan Mukherjee​&amp;nbsp;with your next update so that I will get notified. &lt;/P&gt;</description>
      <pubDate>Fri, 23 Jun 2023 07:18:12 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/i-have-a-job-with-multiple-tasks-running-asynchronously-and-i/m-p/2692#M20</guid>
      <dc:creator>Debayan</dc:creator>
      <dc:date>2023-06-23T07:18:12Z</dc:date>
    </item>
  </channel>
</rss>

