<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic mlflow project train and validate - Control over the data used in the script? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/mlflow-project-train-and-validate-control-over-the-data-used-in/m-p/17489#M11510</link>
    <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Hi there,&lt;/P&gt;
&lt;P&gt;Trying to decide if I am going to get started with ml and really enjoyed it so far.&lt;/P&gt;
&lt;P&gt;When going through the documentation, there was a blocker moment for me, as I feel the documentation doesn't mention much about the dataset used to train the model. &lt;/P&gt;
&lt;P&gt;Model = Data + (Algorithm &amp;amp; hyperparameters )&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="0693f000007OoS1AAK"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/2462iD0A1630608B47D3C/image-size/large?v=v2&amp;amp;px=999" role="button" title="0693f000007OoS1AAK" alt="0693f000007OoS1AAK" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;I don't see an example in documentation where MLprojects is ran on different data (CSV ,SQL or code based etc..),&lt;/P&gt;
&lt;P&gt;The code shown in the screenshot &lt;/P&gt;
&lt;P&gt;"mlflow run sklearn_elasticnet_wine -P alpha = 0.5 would retrain a model with different hyperparameters, but on what data?&lt;/P&gt;
&lt;P&gt;Has it already been included in the project, and can you change it to train the model on different data.&lt;/P&gt;
&lt;P&gt;How do you store and track the datasets being used?&lt;/P&gt;
&lt;P&gt;Can someone explain please?&lt;/P&gt;
&lt;P&gt;Thanks,&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 21 Jul 2021 12:41:23 GMT</pubDate>
    <dc:creator>VirajV</dc:creator>
    <dc:date>2021-07-21T12:41:23Z</dc:date>
    <item>
      <title>mlflow project train and validate - Control over the data used in the script?</title>
      <link>https://community.databricks.com/t5/data-engineering/mlflow-project-train-and-validate-control-over-the-data-used-in/m-p/17489#M11510</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Hi there,&lt;/P&gt;
&lt;P&gt;Trying to decide if I am going to get started with ml and really enjoyed it so far.&lt;/P&gt;
&lt;P&gt;When going through the documentation, there was a blocker moment for me, as I feel the documentation doesn't mention much about the dataset used to train the model. &lt;/P&gt;
&lt;P&gt;Model = Data + (Algorithm &amp;amp; hyperparameters )&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="0693f000007OoS1AAK"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/2462iD0A1630608B47D3C/image-size/large?v=v2&amp;amp;px=999" role="button" title="0693f000007OoS1AAK" alt="0693f000007OoS1AAK" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;I don't see an example in documentation where MLprojects is ran on different data (CSV ,SQL or code based etc..),&lt;/P&gt;
&lt;P&gt;The code shown in the screenshot &lt;/P&gt;
&lt;P&gt;"mlflow run sklearn_elasticnet_wine -P alpha = 0.5 would retrain a model with different hyperparameters, but on what data?&lt;/P&gt;
&lt;P&gt;Has it already been included in the project, and can you change it to train the model on different data.&lt;/P&gt;
&lt;P&gt;How do you store and track the datasets being used?&lt;/P&gt;
&lt;P&gt;Can someone explain please?&lt;/P&gt;
&lt;P&gt;Thanks,&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 21 Jul 2021 12:41:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/mlflow-project-train-and-validate-control-over-the-data-used-in/m-p/17489#M11510</guid>
      <dc:creator>VirajV</dc:creator>
      <dc:date>2021-07-21T12:41:23Z</dc:date>
    </item>
  </channel>
</rss>

