<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Different JSON Results when Running a Job vs Running a Notebook in Get Started Discussions</title>
    <link>https://community.databricks.com/t5/get-started-discussions/different-json-results-when-running-a-job-vs-running-a-notebook/m-p/108613#M9735</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/132320"&gt;@rgower&lt;/a&gt;,&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Are you using same cluster compute for both scenarios, run via notebook manually and via workflow job?&lt;/P&gt;</description>
    <pubDate>Mon, 03 Feb 2025 16:17:14 GMT</pubDate>
    <dc:creator>Alberto_Umana</dc:creator>
    <dc:date>2025-02-03T16:17:14Z</dc:date>
    <item>
      <title>Different JSON Results when Running a Job vs Running a Notebook</title>
      <link>https://community.databricks.com/t5/get-started-discussions/different-json-results-when-running-a-job-vs-running-a-notebook/m-p/108611#M9734</link>
      <description>&lt;P&gt;I have a regularly scheduled job that runs a PySpark Notebook that GETs semi-structured JSON data from an external API, loads that data into dataframes, and saves those dataframes to delta tables in Databricks.&amp;nbsp;&lt;BR /&gt;I have the schema for the JSON defined in my Notebook, but because the API data is semi-structured, I have to convert some of the fields into strings as those fields could have multiple data types in the source data (struct, array, int, string, etc.). A truncated example of this schema definition is below, where the "data" field is the field to focus on:&lt;/P&gt;&lt;P&gt;...&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;StructField&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"fields"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;ArrayType&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;StructType&lt;/SPAN&gt;&lt;SPAN&gt;([&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;StructField&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"id"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;StringType&lt;/SPAN&gt;&lt;SPAN&gt;(), &lt;/SPAN&gt;&lt;SPAN&gt;True&lt;/SPAN&gt;&lt;SPAN&gt;),&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;StructField&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"name"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;StringType&lt;/SPAN&gt;&lt;SPAN&gt;(), &lt;/SPAN&gt;&lt;SPAN&gt;True&lt;/SPAN&gt;&lt;SPAN&gt;),&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;StructField&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"type"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;StringType&lt;/SPAN&gt;&lt;SPAN&gt;(), &lt;/SPAN&gt;&lt;SPAN&gt;True&lt;/SPAN&gt;&lt;SPAN&gt;),&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;StructField&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"value"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;StructType&lt;/SPAN&gt;&lt;SPAN&gt;([&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;StructField&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"data"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;StringType&lt;/SPAN&gt;&lt;SPAN&gt;(), &lt;/SPAN&gt;&lt;SPAN&gt;True&lt;/SPAN&gt;&lt;SPAN&gt;),&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;StructField&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"type"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;StringType&lt;/SPAN&gt;&lt;SPAN&gt;(), &lt;/SPAN&gt;&lt;SPAN&gt;True&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;...&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;Whenever I run the notebook directly, Databricks saves the rows where those fields are JSON structures as properly formatted JSON strings like so (data redacted):&lt;/P&gt;&lt;P&gt;"{'id': *****, 'type': '*****', 'title': '********', 'allDay': ****, 'startTime': '*****', 'endTime': '*****', 'attendees': [{'emailAddress':'*****' ..."&lt;/P&gt;&lt;P&gt;However, when I run this Notebook as part of a job, any of the fields that are structs will get converted into useless strings that are no longer accessible in the same way JSON strings are, like so:&lt;/P&gt;&lt;P&gt;"{id=*****, type=*****, title=********, allDay=****, startTime=*****, endTime=*****, attendees=[{emailAddress=****** ..."&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Is this expected behavior, and if not, is there anything I can configure in the Job that would force the same behavior I see in when I run the Notebook directly?&lt;BR /&gt;I tried using a table with a Variant datatype column to load the data into instead, but it seems like the conversion that flattens this JSON into an unusable string happens when I load the data into a Spark dataframe and not the target table, so that solution doesn't seem like it will work until Variants are supported in dataframes in Spark 4.0.&lt;/P&gt;</description>
      <pubDate>Mon, 03 Feb 2025 16:12:32 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/different-json-results-when-running-a-job-vs-running-a-notebook/m-p/108611#M9734</guid>
      <dc:creator>rgower</dc:creator>
      <dc:date>2025-02-03T16:12:32Z</dc:date>
    </item>
    <item>
      <title>Re: Different JSON Results when Running a Job vs Running a Notebook</title>
      <link>https://community.databricks.com/t5/get-started-discussions/different-json-results-when-running-a-job-vs-running-a-notebook/m-p/108613#M9735</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/132320"&gt;@rgower&lt;/a&gt;,&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Are you using same cluster compute for both scenarios, run via notebook manually and via workflow job?&lt;/P&gt;</description>
      <pubDate>Mon, 03 Feb 2025 16:17:14 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/different-json-results-when-running-a-job-vs-running-a-notebook/m-p/108613#M9735</guid>
      <dc:creator>Alberto_Umana</dc:creator>
      <dc:date>2025-02-03T16:17:14Z</dc:date>
    </item>
    <item>
      <title>Re: Different JSON Results when Running a Job vs Running a Notebook</title>
      <link>https://community.databricks.com/t5/get-started-discussions/different-json-results-when-running-a-job-vs-running-a-notebook/m-p/108623#M9736</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/106294"&gt;@Alberto_Umana&lt;/a&gt;&amp;nbsp;That may be the root cause - I did some additional testing in a development environment and found that the difference seems to be Serverless compute vs All-Purpose/Job Compute clusters.&lt;/P&gt;&lt;P&gt;When I run the Notebook and Job via serverless compute, the JSON is formatted correctly.&lt;BR /&gt;When I run the Notebook and Job via an All-Purpose Compute (tested on a few different versions DBR 15.4, 16.0, 16.1, and 16.2 beta), or the Job via Job Compute, the JSON is formatted incorrectly.&lt;/P&gt;&lt;P&gt;I've been running the job via Job Compute because I was getting OOM errors and driver crashes when running the job via Serverless compute, but I might try to break up the job into multiple jobs to see if I can get Serverless compute to work for me.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;All this begs the question - is this new result expected? Should I expect different JSON formatting results on different computes, or is this something that should be investigated deeper?&lt;/P&gt;</description>
      <pubDate>Mon, 03 Feb 2025 17:03:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/different-json-results-when-running-a-job-vs-running-a-notebook/m-p/108623#M9736</guid>
      <dc:creator>rgower</dc:creator>
      <dc:date>2025-02-03T17:03:43Z</dc:date>
    </item>
    <item>
      <title>Re: Different JSON Results when Running a Job vs Running a Notebook</title>
      <link>https://community.databricks.com/t5/get-started-discussions/different-json-results-when-running-a-job-vs-running-a-notebook/m-p/108625#M9737</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/132320"&gt;@rgower&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;Thanks for your comments... I think it should be investigated further, since serverless should be using 15.4 DBR. I'll see if I can find something internally or If I can replicate it... otherwise would require a case with us to dig deeper.&lt;/P&gt;</description>
      <pubDate>Mon, 03 Feb 2025 17:08:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/different-json-results-when-running-a-job-vs-running-a-notebook/m-p/108625#M9737</guid>
      <dc:creator>Alberto_Umana</dc:creator>
      <dc:date>2025-02-03T17:08:56Z</dc:date>
    </item>
    <item>
      <title>Re: Different JSON Results when Running a Job vs Running a Notebook</title>
      <link>https://community.databricks.com/t5/get-started-discussions/different-json-results-when-running-a-job-vs-running-a-notebook/m-p/108626#M9738</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/106294"&gt;@Alberto_Umana&lt;/a&gt;&amp;nbsp;Sounds good, thank you for looking into it and let me know if there's any additional information I can provide in the meantime!&lt;/P&gt;</description>
      <pubDate>Mon, 03 Feb 2025 17:10:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/different-json-results-when-running-a-job-vs-running-a-notebook/m-p/108626#M9738</guid>
      <dc:creator>rgower</dc:creator>
      <dc:date>2025-02-03T17:10:48Z</dc:date>
    </item>
  </channel>
</rss>

