<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How do I display output from applyinPandas function? in Get Started Discussions</title>
    <link>https://community.databricks.com/t5/get-started-discussions/how-do-i-display-output-from-applyinpandas-function/m-p/118686#M10001</link>
    <description>&lt;P&gt;Here are some idead/approaches to consider:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;DIV class="paragraph"&gt;To inspect the attributes of a &lt;CODE&gt;df&lt;/CODE&gt; dataset within a function used in &lt;CODE&gt;applyInPandas&lt;/CODE&gt; on a Databricks Runtime 13.3 cluster, you can use debugging techniques that help you explore the structure and content of your DataFrame. Here are some suggested steps:&lt;/DIV&gt;
&lt;OL start="1"&gt;
&lt;LI&gt;
&lt;DIV class="paragraph"&gt;&lt;STRONG&gt;Check Input DataFrame Attributes&lt;/STRONG&gt;: Before performing transformations, you can use standard Pandas functionality to inspect the attributes of the DataFrame being passed to your function. Add debugging code inside your function to print out relevant details of the DataFrame, such as its columns, data types, and the first few rows. For example: ```python def train_model(df): # Print attributes for debugging print("Columns:", df.columns) print("Data types:\n", df.dtypes) print("First few rows:\n", df.head())&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;# Your existing transformations train = df.copy() train['age_group'] = train['age'].apply(lambda x: 'child' if x &amp;lt; 18 else 'adult' if x &amp;lt; 60 else 'senior') train = train.drop(columns=['age']) return train ```&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV class="paragraph"&gt;&lt;STRONG&gt;Inspect Attributes Using Pandas UDF Logs&lt;/STRONG&gt;: If you are running the UDF on a cluster, you can write logs from within the function that explain the DataFrame's attributes and collect these logs for further inspection. Use the &lt;CODE&gt;logging&lt;/CODE&gt; module or simply print statements (results will appear in the Databricks notebook logs).&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;Example: ```python import logging&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;logging.basicConfig(level=logging.INFO)&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;def train_model(df): logging.info(f"DataFrame attributes: {df.info()}") logging.info(f"First few rows:\n{df.head()}") ... ```&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV class="paragraph"&gt;&lt;STRONG&gt;Enable Debug Logging for Spark Execution&lt;/STRONG&gt;: Use the cluster's Spark logging features to track the execution of your &lt;CODE&gt;applyInPandas&lt;/CODE&gt; function. You might need to enable additional logging on your cluster or workspace. This can help debug issues related to the structure of the DataFrame.&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV class="paragraph"&gt;&lt;STRONG&gt;Adapt UDF for Databricks Runtime 13.3&lt;/STRONG&gt;: Ensure compatibility with Databricks version 13.3, given that Python scalar UDFs and Pandas UDFs are supported from this version onwards. For handling grouped data, make sure that group keys and input data are carefully structured to avoid runtime errors.&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV class="paragraph"&gt;&lt;STRONG&gt;Validate Column Attributes Before Grouping&lt;/STRONG&gt;: In the context of grouped execution, inaccuracies in column attributes can cause errors. As part of preprocessing, verify that the &lt;CODE&gt;group_key&lt;/CODE&gt; column exists, and confirm its data type matches what your grouped operation expects.&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV class="paragraph"&gt;&lt;STRONG&gt;Troubleshooting the Schema Mismatch&lt;/STRONG&gt;: Verify that the schema defined in your &lt;CODE&gt;output_schema&lt;/CODE&gt; exactly matches the structure of the DataFrame returned by your function. If additional columns are expected, update your schema definition accordingly.&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;For example: &lt;CODE&gt;python
output_schema = StructType([
    StructField("id", IntegerType()),
    StructField("age_group", StringType()),
    # Include other fields expected in the output DataFrame
])
&lt;/CODE&gt;&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV class="paragraph"&gt;&lt;STRONG&gt;Interactive Debugging&lt;/STRONG&gt;: For iterative development, manually apply transformations to a sample Pandas DataFrame to ensure correctness before deploying the UDF. Start by loading a small representative sample from your Spark DataFrame using &lt;CODE&gt;.toPandas()&lt;/CODE&gt; and test your transformations locally.&lt;/DIV&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;DIV class="paragraph"&gt;Since you mentioned using a sample code, these debugging strategies should help you validate your data transformations and inspect the attributes of your DataFrame effectively.&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;Cheers, Lou.&lt;/DIV&gt;</description>
    <pubDate>Fri, 09 May 2025 13:54:31 GMT</pubDate>
    <dc:creator>Louis_Frolio</dc:creator>
    <dc:date>2025-05-09T13:54:31Z</dc:date>
    <item>
      <title>How do I display output from applyinPandas function?</title>
      <link>https://community.databricks.com/t5/get-started-discussions/how-do-i-display-output-from-applyinpandas-function/m-p/118669#M9989</link>
      <description>&lt;P&gt;&lt;SPAN&gt;I'm using databricks version 13.3. I have a function which I'm calling by using the&amp;nbsp;&lt;/SPAN&gt;applyInPandas&lt;SPAN&gt;&amp;nbsp;function. I need to see the attributes of my df dataset which I'm using inside my function. My sample code looks like&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;def train_model(df):&lt;BR /&gt;# Copy input DataFrame&lt;BR /&gt;train = df.copy()&lt;BR /&gt;&lt;BR /&gt;# Use 'age' to create a new column, for example: age groups&lt;BR /&gt;train['age_group'] = train['age'].apply(lambda x: 'child' if x &amp;lt; 18 else 'adult' if x &amp;lt; 60 else 'senior')&lt;BR /&gt;&lt;BR /&gt;# Drop the original 'age' column&lt;BR /&gt;train = train.drop(columns=['age'])&lt;BR /&gt;&lt;BR /&gt;return train&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;My applyinPandas function looks like&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;from pyspark.sql.functions import pandas_udf&lt;BR /&gt;from pyspark.sql.types import StructType, StructField, StringType, IntegerType&lt;/P&gt;&lt;P&gt;# Define the output schema after transformation&lt;BR /&gt;output_schema = StructType([&lt;BR /&gt;StructField("id", IntegerType()),&lt;BR /&gt;StructField("age_group", StringType()),&lt;BR /&gt;# Add other fields present in the input df if needed&lt;BR /&gt;])&lt;/P&gt;&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/54169"&gt;@pandas&lt;/a&gt;_udf(output_schema)&lt;BR /&gt;def apply_train_model(df):&lt;BR /&gt;return train_model(df)&lt;/P&gt;&lt;P&gt;# Then apply it on a Spark DataFrame grouped or ungrouped&lt;BR /&gt;result = spark_df.groupby("some_grouping_column").applyInPandas(apply_train_model, schema=output_schema)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Kindly note that this is a sample code taken from internet, not the actual code as I don't have databricks in my local, I'm using databricks in my client system &amp;amp; I can't able to share client code&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 09 May 2025 10:37:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/how-do-i-display-output-from-applyinpandas-function/m-p/118669#M9989</guid>
      <dc:creator>DbricksLearner1</dc:creator>
      <dc:date>2025-05-09T10:37:30Z</dc:date>
    </item>
    <item>
      <title>Re: How do I display output from applyinPandas function?</title>
      <link>https://community.databricks.com/t5/get-started-discussions/how-do-i-display-output-from-applyinpandas-function/m-p/118686#M10001</link>
      <description>&lt;P&gt;Here are some idead/approaches to consider:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;DIV class="paragraph"&gt;To inspect the attributes of a &lt;CODE&gt;df&lt;/CODE&gt; dataset within a function used in &lt;CODE&gt;applyInPandas&lt;/CODE&gt; on a Databricks Runtime 13.3 cluster, you can use debugging techniques that help you explore the structure and content of your DataFrame. Here are some suggested steps:&lt;/DIV&gt;
&lt;OL start="1"&gt;
&lt;LI&gt;
&lt;DIV class="paragraph"&gt;&lt;STRONG&gt;Check Input DataFrame Attributes&lt;/STRONG&gt;: Before performing transformations, you can use standard Pandas functionality to inspect the attributes of the DataFrame being passed to your function. Add debugging code inside your function to print out relevant details of the DataFrame, such as its columns, data types, and the first few rows. For example: ```python def train_model(df): # Print attributes for debugging print("Columns:", df.columns) print("Data types:\n", df.dtypes) print("First few rows:\n", df.head())&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;# Your existing transformations train = df.copy() train['age_group'] = train['age'].apply(lambda x: 'child' if x &amp;lt; 18 else 'adult' if x &amp;lt; 60 else 'senior') train = train.drop(columns=['age']) return train ```&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV class="paragraph"&gt;&lt;STRONG&gt;Inspect Attributes Using Pandas UDF Logs&lt;/STRONG&gt;: If you are running the UDF on a cluster, you can write logs from within the function that explain the DataFrame's attributes and collect these logs for further inspection. Use the &lt;CODE&gt;logging&lt;/CODE&gt; module or simply print statements (results will appear in the Databricks notebook logs).&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;Example: ```python import logging&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;logging.basicConfig(level=logging.INFO)&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;def train_model(df): logging.info(f"DataFrame attributes: {df.info()}") logging.info(f"First few rows:\n{df.head()}") ... ```&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV class="paragraph"&gt;&lt;STRONG&gt;Enable Debug Logging for Spark Execution&lt;/STRONG&gt;: Use the cluster's Spark logging features to track the execution of your &lt;CODE&gt;applyInPandas&lt;/CODE&gt; function. You might need to enable additional logging on your cluster or workspace. This can help debug issues related to the structure of the DataFrame.&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV class="paragraph"&gt;&lt;STRONG&gt;Adapt UDF for Databricks Runtime 13.3&lt;/STRONG&gt;: Ensure compatibility with Databricks version 13.3, given that Python scalar UDFs and Pandas UDFs are supported from this version onwards. For handling grouped data, make sure that group keys and input data are carefully structured to avoid runtime errors.&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV class="paragraph"&gt;&lt;STRONG&gt;Validate Column Attributes Before Grouping&lt;/STRONG&gt;: In the context of grouped execution, inaccuracies in column attributes can cause errors. As part of preprocessing, verify that the &lt;CODE&gt;group_key&lt;/CODE&gt; column exists, and confirm its data type matches what your grouped operation expects.&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV class="paragraph"&gt;&lt;STRONG&gt;Troubleshooting the Schema Mismatch&lt;/STRONG&gt;: Verify that the schema defined in your &lt;CODE&gt;output_schema&lt;/CODE&gt; exactly matches the structure of the DataFrame returned by your function. If additional columns are expected, update your schema definition accordingly.&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;For example: &lt;CODE&gt;python
output_schema = StructType([
    StructField("id", IntegerType()),
    StructField("age_group", StringType()),
    # Include other fields expected in the output DataFrame
])
&lt;/CODE&gt;&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV class="paragraph"&gt;&lt;STRONG&gt;Interactive Debugging&lt;/STRONG&gt;: For iterative development, manually apply transformations to a sample Pandas DataFrame to ensure correctness before deploying the UDF. Start by loading a small representative sample from your Spark DataFrame using &lt;CODE&gt;.toPandas()&lt;/CODE&gt; and test your transformations locally.&lt;/DIV&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;DIV class="paragraph"&gt;Since you mentioned using a sample code, these debugging strategies should help you validate your data transformations and inspect the attributes of your DataFrame effectively.&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;Cheers, Lou.&lt;/DIV&gt;</description>
      <pubDate>Fri, 09 May 2025 13:54:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/how-do-i-display-output-from-applyinpandas-function/m-p/118686#M10001</guid>
      <dc:creator>Louis_Frolio</dc:creator>
      <dc:date>2025-05-09T13:54:31Z</dc:date>
    </item>
  </channel>
</rss>

