<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Sharing Output between different tasks for MLOps pipeline as a Databricks Jobs in Machine Learning</title>
    <link>https://community.databricks.com/t5/machine-learning/sharing-output-between-different-tasks-for-mlops-pipeline-as-a/m-p/71134#M3305</link>
    <description>&lt;P&gt;Hi,&lt;BR /&gt;There is a way to share value from one task to another, but this will only work when the pipeline is executed from workflow.&lt;/P&gt;&lt;LI-CODE lang="python"&gt;#Code from which you want to pass the value.
dbutils.jobs.taskValues.set(key='first_notebook_list', value=&amp;lt;value or variable you want to pass&amp;gt;)

#Code for notebook in which you  want to access the previous notebook value.
list_object = dbutils.jobs.taskValues.get(taskKey = "&amp;lt;task_name_from_which_value_to_be fetched&amp;gt;", key = "first_notebook_list", default = 00, debugValue = 0)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 31 May 2024 06:38:14 GMT</pubDate>
    <dc:creator>Hkesharwani</dc:creator>
    <dc:date>2024-05-31T06:38:14Z</dc:date>
    <item>
      <title>Sharing Output between different tasks for MLOps pipeline as a Databricks Jobs</title>
      <link>https://community.databricks.com/t5/machine-learning/sharing-output-between-different-tasks-for-mlops-pipeline-as-a/m-p/71104#M3302</link>
      <description>&lt;P&gt;Hello Everyone&lt;/P&gt;&lt;P&gt;We are trying to create an ML pipeline on Databricks using the famous Databricks workflows. Currently our pipeline includes having 3 major components: Data Ingestion, Model Training and Model Testing. My question is whether it is possible to share the output of one task to another (i.e. to share data generated by ingestion task to model training task). Currently we are saving the data in the DBFS volumes and reading it from there but I believe that this approach would fail if the dataset is too big. Is there a more elegant way to pass the output from one task to another maybe something similar to what we can do when creating Azure ML pipeline.&lt;/P&gt;&lt;P&gt;#MachineLearning #DataScience #MLOps&lt;/P&gt;</description>
      <pubDate>Thu, 30 May 2024 14:10:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/sharing-output-between-different-tasks-for-mlops-pipeline-as-a/m-p/71104#M3302</guid>
      <dc:creator>rahuja</dc:creator>
      <dc:date>2024-05-30T14:10:59Z</dc:date>
    </item>
    <item>
      <title>Re: Sharing Output between different tasks for MLOps pipeline as a Databricks Jobs</title>
      <link>https://community.databricks.com/t5/machine-learning/sharing-output-between-different-tasks-for-mlops-pipeline-as-a/m-p/71134#M3305</link>
      <description>&lt;P&gt;Hi,&lt;BR /&gt;There is a way to share value from one task to another, but this will only work when the pipeline is executed from workflow.&lt;/P&gt;&lt;LI-CODE lang="python"&gt;#Code from which you want to pass the value.
dbutils.jobs.taskValues.set(key='first_notebook_list', value=&amp;lt;value or variable you want to pass&amp;gt;)

#Code for notebook in which you  want to access the previous notebook value.
list_object = dbutils.jobs.taskValues.get(taskKey = "&amp;lt;task_name_from_which_value_to_be fetched&amp;gt;", key = "first_notebook_list", default = 00, debugValue = 0)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 31 May 2024 06:38:14 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/sharing-output-between-different-tasks-for-mlops-pipeline-as-a/m-p/71134#M3305</guid>
      <dc:creator>Hkesharwani</dc:creator>
      <dc:date>2024-05-31T06:38:14Z</dc:date>
    </item>
    <item>
      <title>Re: Sharing Output between different tasks for MLOps pipeline as a Databricks Jobs</title>
      <link>https://community.databricks.com/t5/machine-learning/sharing-output-between-different-tasks-for-mlops-pipeline-as-a/m-p/71191#M3313</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt; for your quick reply. I will test it out in our scenario and let you know. Just for confirmation if I have two scripts (e.g. ingest.py and train.py) and in my task named "ingest" I do something like inside ingest.py I run:&lt;BR /&gt;&lt;SPAN class=""&gt;dbutils&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;jobs&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;taskValues&lt;/SPAN&gt;&lt;SPAN class=""&gt;.set&lt;/SPAN&gt;&lt;SPAN class=""&gt;(&lt;/SPAN&gt;&lt;SPAN class=""&gt;taskKey&lt;/SPAN&gt; &lt;SPAN class=""&gt;=&lt;/SPAN&gt; &lt;SPAN class=""&gt;"ingest"&lt;/SPAN&gt;&lt;SPAN class=""&gt;,&lt;/SPAN&gt; &lt;SPAN class=""&gt;key&lt;/SPAN&gt; &lt;SPAN class=""&gt;=&lt;/SPAN&gt; &lt;SPAN class=""&gt;"processed_data"&lt;/SPAN&gt;&lt;SPAN class=""&gt;,&lt;/SPAN&gt; value=data&lt;SPAN class=""&gt;)&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN class=""&gt;then should I pass inside the pipeline for the train.py:&amp;nbsp;&lt;SPAN&gt;{{tasks.ingest.values.processed_data&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN class=""&gt;&lt;SPAN&gt;}}?&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 31 May 2024 12:50:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/sharing-output-between-different-tasks-for-mlops-pipeline-as-a/m-p/71191#M3313</guid>
      <dc:creator>rahuja</dc:creator>
      <dc:date>2024-05-31T12:50:43Z</dc:date>
    </item>
    <item>
      <title>Re: Sharing Output between different tasks for MLOps pipeline as a Databricks Jobs</title>
      <link>https://community.databricks.com/t5/machine-learning/sharing-output-between-different-tasks-for-mlops-pipeline-as-a/m-p/71278#M3316</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;&amp;nbsp;I looked into your solution and it seems like that the value you set or get needs to be json serialisable this means I can not pass for e.g. a spark or pandas dataframe from one step to another directly. I will have to serialise and de-serialise it. Is there any step for passing Big Data between various steps of the jobs?&lt;/P&gt;</description>
      <pubDate>Fri, 31 May 2024 18:44:42 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/sharing-output-between-different-tasks-for-mlops-pipeline-as-a/m-p/71278#M3316</guid>
      <dc:creator>rahuja</dc:creator>
      <dc:date>2024-05-31T18:44:42Z</dc:date>
    </item>
    <item>
      <title>Re: Sharing Output between different tasks for MLOps pipeline as a Databricks Jobs</title>
      <link>https://community.databricks.com/t5/machine-learning/sharing-output-between-different-tasks-for-mlops-pipeline-as-a/m-p/73784#M3360</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/99364"&gt;@Hkesharwani&lt;/a&gt;&amp;nbsp; any updates?&lt;/P&gt;</description>
      <pubDate>Thu, 13 Jun 2024 11:41:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/sharing-output-between-different-tasks-for-mlops-pipeline-as-a/m-p/73784#M3360</guid>
      <dc:creator>rahuja</dc:creator>
      <dc:date>2024-06-13T11:41:13Z</dc:date>
    </item>
  </channel>
</rss>

