<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Explode is giving unexpected results. in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/explode-is-giving-unexpected-results/m-p/54315#M30047</link>
    <description>&lt;P&gt;Just to make it clear, I have about 19,000 ids and when I explode I only get 4 rows that correspond to the first id.&lt;/P&gt;</description>
    <pubDate>Thu, 30 Nov 2023 12:01:05 GMT</pubDate>
    <dc:creator>BenLambert</dc:creator>
    <dc:date>2023-11-30T12:01:05Z</dc:date>
    <item>
      <title>Explode is giving unexpected results.</title>
      <link>https://community.databricks.com/t5/data-engineering/explode-is-giving-unexpected-results/m-p/54308#M30042</link>
      <description>&lt;P&gt;I have a dataframe with a schema similar to the following:&lt;/P&gt;&lt;P&gt;id: string&lt;/P&gt;&lt;P&gt;array_field: array&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp;element: struct&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; field1: string&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; field2: string&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; array_field2: array&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;element: struct&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;nested_field: string&lt;/P&gt;&lt;P&gt;I am trying to flatten this into rows.&amp;nbsp;&lt;/P&gt;&lt;P&gt;The issue that I am having is that when I do something like e_df = df.select("id", F.explode("array_field")) it is only returning the exploded values for the first id. I am not sure if this is something simple, but I have wasted a lot of time trying to sort out what the issue is. When I look every id has an associated array field and I would think that the result should be something like:&lt;/P&gt;&lt;P&gt;id, col&lt;/P&gt;&lt;P&gt;1, first element struct&lt;/P&gt;&lt;P&gt;1, second element struct&lt;/P&gt;&lt;P&gt;2, first element struct&lt;/P&gt;&lt;P&gt;2, second element struct&amp;nbsp;&lt;/P&gt;&lt;P&gt;and so on. Any insight here would be very helpful.&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 30 Nov 2023 11:39:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/explode-is-giving-unexpected-results/m-p/54308#M30042</guid>
      <dc:creator>BenLambert</dc:creator>
      <dc:date>2023-11-30T11:39:53Z</dc:date>
    </item>
    <item>
      <title>Re: Explode is giving unexpected results.</title>
      <link>https://community.databricks.com/t5/data-engineering/explode-is-giving-unexpected-results/m-p/54313#M30045</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt; thanks for the tip. This is the approach I am trying to take, but the issue is that exploding the outer array fails. It only returns the 4 values associated with the first id and not all possible rows.&lt;/P&gt;</description>
      <pubDate>Thu, 30 Nov 2023 11:58:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/explode-is-giving-unexpected-results/m-p/54313#M30045</guid>
      <dc:creator>BenLambert</dc:creator>
      <dc:date>2023-11-30T11:58:53Z</dc:date>
    </item>
    <item>
      <title>Re: Explode is giving unexpected results.</title>
      <link>https://community.databricks.com/t5/data-engineering/explode-is-giving-unexpected-results/m-p/54315#M30047</link>
      <description>&lt;P&gt;Just to make it clear, I have about 19,000 ids and when I explode I only get 4 rows that correspond to the first id.&lt;/P&gt;</description>
      <pubDate>Thu, 30 Nov 2023 12:01:05 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/explode-is-giving-unexpected-results/m-p/54315#M30047</guid>
      <dc:creator>BenLambert</dc:creator>
      <dc:date>2023-11-30T12:01:05Z</dc:date>
    </item>
    <item>
      <title>Re: Explode is giving unexpected results.</title>
      <link>https://community.databricks.com/t5/data-engineering/explode-is-giving-unexpected-results/m-p/54318#M30049</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;&amp;nbsp;additionally I am getting a message that says something to the effect of:&lt;/P&gt;&lt;P&gt;"&lt;SPAN&gt;This query is on a non-Delta table with many small files. To improve the performance of queries, convert to Delta and run the OPTIMIZE command on the table dbfs:/mnt/bucket/id_1.json.", which makes it seem that it is only reading the file containing the first id for some reason.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 30 Nov 2023 12:08:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/explode-is-giving-unexpected-results/m-p/54318#M30049</guid>
      <dc:creator>BenLambert</dc:creator>
      <dc:date>2023-11-30T12:08:37Z</dc:date>
    </item>
    <item>
      <title>Re: Explode is giving unexpected results.</title>
      <link>https://community.databricks.com/t5/data-engineering/explode-is-giving-unexpected-results/m-p/54594#M30125</link>
      <description>&lt;P&gt;It turns out that if the exploded fields don't match the schema that was defined when reading the JSON in the first place that all the data that doesn't match is silently dropped. This is not really nice default behaviour.&lt;/P&gt;</description>
      <pubDate>Mon, 04 Dec 2023 10:18:27 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/explode-is-giving-unexpected-results/m-p/54594#M30125</guid>
      <dc:creator>BenLambert</dc:creator>
      <dc:date>2023-12-04T10:18:27Z</dc:date>
    </item>
  </channel>
</rss>

