<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Infer_signature for a dictionary datasets during mlflow registration in Machine Learning</title>
    <link>https://community.databricks.com/t5/machine-learning/infer-signature-for-a-dictionary-datasets-during-mlflow/m-p/129174#M4244</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/164083"&gt;@skosaraju&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;Thank you for contacting Databricks support!&lt;/P&gt;
&lt;P class="qt3gz91 paragraph"&gt;Because &lt;CODE class="qt3gz9f"&gt;infer_signature()&lt;/CODE&gt; cannot handle a dict of DataFrames directly, you will need to convert this structure into a dictionary of &lt;EM&gt;row dicts&lt;/EM&gt;, or manually build a ModelSignature.&lt;/P&gt;
&lt;H5 class="_7uu25p0 qt3gz9c _7pq7t612 heading5 _7uu25p1"&gt;Option A: Flatten Each DataFrame to Dict&lt;/H5&gt;
&lt;DIV class="go8b9g1 _7pq7t6c4"&gt;
&lt;PRE&gt;&lt;CODE class="markdown-code-python qt3gz9e hljs language-python _1ymogdh2"&gt;&lt;SPAN class="hljs-comment"&gt;# Create a dictionary of dicts (each containing a single record for each model)&lt;/SPAN&gt;
input_example_simplified = {
    k: v.iloc[&lt;SPAN class="hljs-number"&gt;0&lt;/SPAN&gt;].to_dict()
    &lt;SPAN class="hljs-keyword"&gt;for&lt;/SPAN&gt; k, v &lt;SPAN class="hljs-keyword"&gt;in&lt;/SPAN&gt; input_example.items()
}&lt;/CODE&gt;&lt;/PRE&gt;
&lt;/DIV&gt;
&lt;P class="qt3gz91 paragraph"&gt;Now &lt;CODE class="qt3gz9f"&gt;input_example_simplified&lt;/CODE&gt; looks like:&lt;/P&gt;
&lt;DIV class="go8b9g1 _7pq7t6c4"&gt;
&lt;PRE&gt;&lt;CODE class="markdown-code-python qt3gz9e hljs language-python _1ymogdh2"&gt;{
    &lt;SPAN class="hljs-string"&gt;"local_outlier_factor"&lt;/SPAN&gt;: {&lt;SPAN class="hljs-string"&gt;'x'&lt;/SPAN&gt;: &lt;SPAN class="hljs-number"&gt;1.23&lt;/SPAN&gt;, &lt;SPAN class="hljs-string"&gt;'y'&lt;/SPAN&gt;: &lt;SPAN class="hljs-number"&gt;4.56&lt;/SPAN&gt;},
    &lt;SPAN class="hljs-string"&gt;"isolation_forest"&lt;/SPAN&gt;: {&lt;SPAN class="hljs-string"&gt;'a'&lt;/SPAN&gt;: &lt;SPAN class="hljs-number"&gt;0.1&lt;/SPAN&gt;, &lt;SPAN class="hljs-string"&gt;'b'&lt;/SPAN&gt;: &lt;SPAN class="hljs-number"&gt;0.2&lt;/SPAN&gt;, &lt;SPAN class="hljs-string"&gt;'c'&lt;/SPAN&gt;: &lt;SPAN class="hljs-number"&gt;0.3&lt;/SPAN&gt;}
}&lt;/CODE&gt;&lt;/PRE&gt;
&lt;DIV class="go8b9g2"&gt;
&lt;DIV class="go8b9g4 _7pq7t6c2"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="_17yk06p0"&gt;&lt;SPAN&gt;You can use this with &lt;/SPAN&gt;&lt;CODE class="qt3gz9f"&gt;infer_signature()&lt;/CODE&gt;&lt;SPAN&gt;:&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV class="go8b9g1 _7pq7t6c4"&gt;
&lt;PRE&gt;&lt;CODE class="markdown-code-python qt3gz9e hljs language-python _1ymogdh2"&gt;&lt;SPAN class="hljs-keyword"&gt;from&lt;/SPAN&gt; mlflow.models.signature &lt;SPAN class="hljs-keyword"&gt;import&lt;/SPAN&gt; infer_signature

signature = infer_signature(input_example_simplified, output_example_simplified)&lt;/CODE&gt;&lt;/PRE&gt;
&lt;DIV class="go8b9g2"&gt;
&lt;DIV class="go8b9g4 _7pq7t6c2"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;</description>
    <pubDate>Thu, 21 Aug 2025 19:38:26 GMT</pubDate>
    <dc:creator>Kumaran</dc:creator>
    <dc:date>2025-08-21T19:38:26Z</dc:date>
    <item>
      <title>Infer_signature for a dictionary datasets during mlflow registration</title>
      <link>https://community.databricks.com/t5/machine-learning/infer-signature-for-a-dictionary-datasets-during-mlflow/m-p/119373#M4080</link>
      <description>&lt;P&gt;Hello community,&lt;/P&gt;&lt;P&gt;Can you please guide me here. I am trying to build custom Ensemble model where I will be passing a dictionary of datasets to the fit() and predict() with the keys being the model_names and value being the respective datasets for each of the models. The idea behind this is I want to register only 1 ensemble model rather than 5 different models.&lt;/P&gt;&lt;P&gt;I will be instantiating the models based on a config and pass the respective dataset. I am currently stuck at the infer_signature() step coz I am unable to build the right structure that its expecting.&lt;/P&gt;&lt;P&gt;Below is the code snippet for my fit and predict. Can you please help me construct the model_input for infer_signature? I know if I register the models separately, I will be able to. But I want to only register 1 model.&lt;/P&gt;&lt;P&gt;class CustomEnsembleModel(mlflow.pyfunc.PythonModel):&lt;/P&gt;&lt;P&gt;def __init__(self, model_config, dbx_params):&lt;BR /&gt;if len(model_config) &amp;lt; 1:&lt;BR /&gt;raise ValueError("The model_config must contain at least one model configuration.")&lt;/P&gt;&lt;P&gt;self.model_config = model_config&lt;BR /&gt;self.models = {} # Dictionary to store model instances&lt;BR /&gt;self.dbx_params = dbx_params&lt;/P&gt;&lt;P&gt;for model in self.model_config:&lt;BR /&gt;model_name = model['model_name']&lt;BR /&gt;hyper_params = model['hyper_params']&lt;BR /&gt;self.models[model_name] = self._get_model_instance(model_name, hyper_params)&lt;/P&gt;&lt;P&gt;def _get_model_instance(self, model_name, hyper_params):&lt;BR /&gt;if model_name == 'local_outlier_factor':&lt;BR /&gt;return LocalOutlierFactorModel(hyper_params)&lt;BR /&gt;elif model_name == 'isolation_forest':&lt;BR /&gt;return IsolationForestModel(hyper_params)&lt;BR /&gt;else:&lt;BR /&gt;raise ValueError(f"Unsupported model name: {model_name}")&lt;/P&gt;&lt;P&gt;def fit(self, input_data_dict):&lt;BR /&gt;model_outputs = {}&lt;BR /&gt;for model_name, model_instance in self.models.items():&lt;BR /&gt;input_df = input_data_dict[model_name]&lt;BR /&gt;model_outputs[model_name] = model_instance.fit(input_df)&lt;BR /&gt;return model_outputs&lt;/P&gt;&lt;P&gt;def predict(self, input_data_dict):&lt;BR /&gt;predictions = {}&lt;BR /&gt;for model_name, model_instance in self.models.items():&lt;BR /&gt;logger.info(f"Loading input datasets for model: {model_name}")&lt;BR /&gt;input_df = input_data_dict[model_name]&lt;BR /&gt;logger.info(f"Loaded input datasets for model: {model_name} with shape {input_df.count()}")&lt;BR /&gt;predictions[model_name] = model_instance.predict(input_df)&lt;BR /&gt;return predictions&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 15 May 2025 17:38:49 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/infer-signature-for-a-dictionary-datasets-during-mlflow/m-p/119373#M4080</guid>
      <dc:creator>skosaraju</dc:creator>
      <dc:date>2025-05-15T17:38:49Z</dc:date>
    </item>
    <item>
      <title>Re: Infer_signature for a dictionary datasets during mlflow registration</title>
      <link>https://community.databricks.com/t5/machine-learning/infer-signature-for-a-dictionary-datasets-during-mlflow/m-p/129174#M4244</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/164083"&gt;@skosaraju&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;Thank you for contacting Databricks support!&lt;/P&gt;
&lt;P class="qt3gz91 paragraph"&gt;Because &lt;CODE class="qt3gz9f"&gt;infer_signature()&lt;/CODE&gt; cannot handle a dict of DataFrames directly, you will need to convert this structure into a dictionary of &lt;EM&gt;row dicts&lt;/EM&gt;, or manually build a ModelSignature.&lt;/P&gt;
&lt;H5 class="_7uu25p0 qt3gz9c _7pq7t612 heading5 _7uu25p1"&gt;Option A: Flatten Each DataFrame to Dict&lt;/H5&gt;
&lt;DIV class="go8b9g1 _7pq7t6c4"&gt;
&lt;PRE&gt;&lt;CODE class="markdown-code-python qt3gz9e hljs language-python _1ymogdh2"&gt;&lt;SPAN class="hljs-comment"&gt;# Create a dictionary of dicts (each containing a single record for each model)&lt;/SPAN&gt;
input_example_simplified = {
    k: v.iloc[&lt;SPAN class="hljs-number"&gt;0&lt;/SPAN&gt;].to_dict()
    &lt;SPAN class="hljs-keyword"&gt;for&lt;/SPAN&gt; k, v &lt;SPAN class="hljs-keyword"&gt;in&lt;/SPAN&gt; input_example.items()
}&lt;/CODE&gt;&lt;/PRE&gt;
&lt;/DIV&gt;
&lt;P class="qt3gz91 paragraph"&gt;Now &lt;CODE class="qt3gz9f"&gt;input_example_simplified&lt;/CODE&gt; looks like:&lt;/P&gt;
&lt;DIV class="go8b9g1 _7pq7t6c4"&gt;
&lt;PRE&gt;&lt;CODE class="markdown-code-python qt3gz9e hljs language-python _1ymogdh2"&gt;{
    &lt;SPAN class="hljs-string"&gt;"local_outlier_factor"&lt;/SPAN&gt;: {&lt;SPAN class="hljs-string"&gt;'x'&lt;/SPAN&gt;: &lt;SPAN class="hljs-number"&gt;1.23&lt;/SPAN&gt;, &lt;SPAN class="hljs-string"&gt;'y'&lt;/SPAN&gt;: &lt;SPAN class="hljs-number"&gt;4.56&lt;/SPAN&gt;},
    &lt;SPAN class="hljs-string"&gt;"isolation_forest"&lt;/SPAN&gt;: {&lt;SPAN class="hljs-string"&gt;'a'&lt;/SPAN&gt;: &lt;SPAN class="hljs-number"&gt;0.1&lt;/SPAN&gt;, &lt;SPAN class="hljs-string"&gt;'b'&lt;/SPAN&gt;: &lt;SPAN class="hljs-number"&gt;0.2&lt;/SPAN&gt;, &lt;SPAN class="hljs-string"&gt;'c'&lt;/SPAN&gt;: &lt;SPAN class="hljs-number"&gt;0.3&lt;/SPAN&gt;}
}&lt;/CODE&gt;&lt;/PRE&gt;
&lt;DIV class="go8b9g2"&gt;
&lt;DIV class="go8b9g4 _7pq7t6c2"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="_17yk06p0"&gt;&lt;SPAN&gt;You can use this with &lt;/SPAN&gt;&lt;CODE class="qt3gz9f"&gt;infer_signature()&lt;/CODE&gt;&lt;SPAN&gt;:&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV class="go8b9g1 _7pq7t6c4"&gt;
&lt;PRE&gt;&lt;CODE class="markdown-code-python qt3gz9e hljs language-python _1ymogdh2"&gt;&lt;SPAN class="hljs-keyword"&gt;from&lt;/SPAN&gt; mlflow.models.signature &lt;SPAN class="hljs-keyword"&gt;import&lt;/SPAN&gt; infer_signature

signature = infer_signature(input_example_simplified, output_example_simplified)&lt;/CODE&gt;&lt;/PRE&gt;
&lt;DIV class="go8b9g2"&gt;
&lt;DIV class="go8b9g4 _7pq7t6c2"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;</description>
      <pubDate>Thu, 21 Aug 2025 19:38:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/infer-signature-for-a-dictionary-datasets-during-mlflow/m-p/129174#M4244</guid>
      <dc:creator>Kumaran</dc:creator>
      <dc:date>2025-08-21T19:38:26Z</dc:date>
    </item>
  </channel>
</rss>

