<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: when and otherwise issue in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/when-and-otherwise-issue/m-p/27156#M19036</link>
    <description>&lt;P&gt;Can you provide the structure that you're using?&lt;/P&gt;&lt;P&gt;Also, a more elaborate sample input and output.&lt;/P&gt;</description>
    <pubDate>Thu, 24 Feb 2022 03:53:02 GMT</pubDate>
    <dc:creator>AmanSehgal</dc:creator>
    <dc:date>2022-02-24T03:53:02Z</dc:date>
    <item>
      <title>when and otherwise issue</title>
      <link>https://community.databricks.com/t5/data-engineering/when-and-otherwise-issue/m-p/27154#M19034</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Here in our scenario we are reading json files as input and it contains nested structure. Few of the attributes are array type struct. Where we need to change name of nested ones. So we created a new structure and doing cast.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We are facing below problem while doing cast&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;For ex : test is a arry type struct&lt;/P&gt;&lt;P&gt;{"test":[{"nestedattr1":"df","columnfield":"er"}]  &lt;/P&gt;&lt;P&gt;we need above one as&lt;/P&gt;&lt;P&gt;{"test":[{"nestedAttr1":"df","columnField":"er"}]  &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;So we defined a new structure and applying cast but when we are receiving test as an empty array {"test":[]} the casting is getting failed. So we are trying to apply below code but its not working&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;df = df.withColumn("test",when(size(df.test)&amp;gt;0,col("test").cast(newteststruct)).otherwise(df.test))&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;error : cannot resolve '`test`' due to data type mismatch: cannot cast array&amp;lt;string&amp;gt; to array&amp;lt;struct&amp;gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Please add your comment to avoid this issue&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 23 Feb 2022 16:49:14 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/when-and-otherwise-issue/m-p/27154#M19034</guid>
      <dc:creator>SailajaB</dc:creator>
      <dc:date>2022-02-23T16:49:14Z</dc:date>
    </item>
    <item>
      <title>Re: when and otherwise issue</title>
      <link>https://community.databricks.com/t5/data-engineering/when-and-otherwise-issue/m-p/27156#M19036</link>
      <description>&lt;P&gt;Can you provide the structure that you're using?&lt;/P&gt;&lt;P&gt;Also, a more elaborate sample input and output.&lt;/P&gt;</description>
      <pubDate>Thu, 24 Feb 2022 03:53:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/when-and-otherwise-issue/m-p/27156#M19036</guid>
      <dc:creator>AmanSehgal</dc:creator>
      <dc:date>2022-02-24T03:53:02Z</dc:date>
    </item>
    <item>
      <title>Re: when and otherwise issue</title>
      <link>https://community.databricks.com/t5/data-engineering/when-and-otherwise-issue/m-p/27157#M19037</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank you for the reply..&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We are using below structure to change/cast the array type struct with nested new names&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;newteststruct= ArrayType(StructType([&lt;/P&gt;&lt;P&gt;StructField(""nestedAttr1" ,StringType()),&lt;/P&gt;&lt;P&gt;StructField("columnField" ,StringType())]))&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Input will come from other source in json format and we are reading into databricks as df.&lt;/P&gt;&lt;P&gt;So here we are applying the schema level transformations as per business to get output in target schema.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;So while casting we are facing an issue where when we get an empty array through the i/p extract.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank you&lt;/P&gt;</description>
      <pubDate>Thu, 24 Feb 2022 08:58:49 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/when-and-otherwise-issue/m-p/27157#M19037</guid>
      <dc:creator>SailajaB</dc:creator>
      <dc:date>2022-02-24T08:58:49Z</dc:date>
    </item>
    <item>
      <title>Re: when and otherwise issue</title>
      <link>https://community.databricks.com/t5/data-engineering/when-and-otherwise-issue/m-p/27158#M19038</link>
      <description>&lt;P&gt;Is it possible for you to replace {"test":[]} with {"test":[{"nestedattr1":"","columnfield":""}]} ?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Because I think  THEN and ELSE expressions should have same type.&lt;/P&gt;</description>
      <pubDate>Thu, 24 Feb 2022 11:36:05 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/when-and-otherwise-issue/m-p/27158#M19038</guid>
      <dc:creator>AmanSehgal</dc:creator>
      <dc:date>2022-02-24T11:36:05Z</dc:date>
    </item>
    <item>
      <title>Re: when and otherwise issue</title>
      <link>https://community.databricks.com/t5/data-engineering/when-and-otherwise-issue/m-p/27159#M19039</link>
      <description>&lt;P&gt;Because I think THEN and ELSE expressions should have same type. I think yes&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;But we don't need to convert&amp;nbsp;{"test":[]} to {"test":[{"nestedattr1":"","columnfield":""}]}&lt;/P&gt;&lt;P&gt;If we get test as an empty array we should avoid this conversion&lt;/P&gt;&lt;P&gt;If we get test as {"test":[{"nestedattr1":"df","columnfield":"er"}] then we have to proceed with conversion&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Is there any way to achieve this? mostly at schema level instead of at each column level&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 24 Feb 2022 11:51:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/when-and-otherwise-issue/m-p/27159#M19039</guid>
      <dc:creator>SailajaB</dc:creator>
      <dc:date>2022-02-24T11:51:39Z</dc:date>
    </item>
    <item>
      <title>Re: when and otherwise issue</title>
      <link>https://community.databricks.com/t5/data-engineering/when-and-otherwise-issue/m-p/27160#M19040</link>
      <description>&lt;P&gt;We used below condition to resolve the issue&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;if dict(df.dtypes)['test'] != 'array&amp;lt;string&amp;gt;':&lt;/P&gt;&lt;P&gt;         df = df.withColumn("test",col("test").cast(newteststruct))&lt;/P&gt;&lt;P&gt;Thank you&lt;/P&gt;</description>
      <pubDate>Fri, 25 Feb 2022 14:10:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/when-and-otherwise-issue/m-p/27160#M19040</guid>
      <dc:creator>SailajaB</dc:creator>
      <dc:date>2022-02-25T14:10:08Z</dc:date>
    </item>
  </channel>
</rss>

