<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Converting an Entity Attribute Value Data Model to Struct datatype in Warehousing &amp; Analytics</title>
    <link>https://community.databricks.com/t5/warehousing-analytics/converting-an-entity-attribute-value-data-model-to-struct/m-p/79811#M1445</link>
    <description>&lt;P&gt;I'm new to databricks and I have a source data model that stores the data as Name-Value pairs (i.e. normalised) in two columns in the table.&lt;/P&gt;&lt;TABLE border="1" width="98.20788530465948%"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD width="33.333333333333336%" height="30px"&gt;&lt;STRONG&gt;EntityID&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD width="33.333333333333336%" height="30px"&gt;&lt;STRONG&gt;Name&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD width="33.333333333333336%" height="30px"&gt;&lt;STRONG&gt;Value&lt;/STRONG&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="33.333333333333336%" height="30px"&gt;1&lt;/TD&gt;&lt;TD width="33.333333333333336%" height="30px"&gt;Field1&lt;/TD&gt;&lt;TD width="33.333333333333336%" height="30px"&gt;SomeValue1&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="33.333333333333336%" height="30px"&gt;1&lt;/TD&gt;&lt;TD width="33.333333333333336%" height="30px"&gt;Field2&lt;/TD&gt;&lt;TD width="33.333333333333336%" height="30px"&gt;SomeValue2&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD height="30px"&gt;1&lt;/TD&gt;&lt;TD height="30px"&gt;Field3&lt;/TD&gt;&lt;TD height="30px"&gt;SomeValue3&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD height="30px"&gt;2&lt;/TD&gt;&lt;TD height="30px"&gt;Field1&lt;/TD&gt;&lt;TD height="30px"&gt;SomeValue1&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD height="30px"&gt;2&lt;/TD&gt;&lt;TD height="30px"&gt;Field3&lt;/TD&gt;&lt;TD height="30px"&gt;SomeValue3&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;The definition of the NAME fields are stored in another table (in this case holedetails) which describes the data type.&lt;/P&gt;&lt;P&gt;Previously, I would use the pivot command to flatten data from this type of data model, but now with the &lt;STRONG&gt;struct&lt;/STRONG&gt; datatype in databricks, I would like to take advantage of this as it seems more efficient and a much easier format to query.&lt;/P&gt;&lt;P&gt;I would like transform the two columns into a struct column using a statement like below. I can prepare the data to be passed to the&amp;nbsp;&lt;SPAN&gt;named_struct as a string, but it fails to parse as the named_struct expects 2 or more parameters and I'm building this dynamically.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;        SELECT a.HOLEID
              ,a.PROJECTCODE
              ,named_struct(concat('(', concat_ws(',', collect_list(concat('"',NAME,'","',VALUE_MOD,'"'))),')'))
            from
      (
        SELECT a.HOLEID
              ,a.PROJECTCODE
              ,a.NAME 
              ,a.VALUE
              ,case b.DATATYPE
                  when 'Text' then VALUE
                  when 'Numeric' then concat("cast('",VALUE,"' as Float)")
              end as VALUE_MOD
        FROM rtx.raw_acq_aar.holedetails a
        join rtx.raw_acq_aar.holecode b
        on a.NAME = b.NAME
        where a.PROJECTCODE = 'SANDSTONE'
      ) a
      group by a.HOLEID
            ,a.PROJECTCODE&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I get the following error when I run the above statement as the&amp;nbsp;&lt;SPAN&gt;named_struct function is expected 2 or more arguments.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;[WRONG_NUM_ARGS.WITHOUT_SUGGESTION] The `named_struct` requires 2n (n &amp;gt; 0) parameters but the actual number is 1. Please, refer to '&lt;A href="https://spark.apache.org/docs/latest/sql-ref-functions.html" target="_blank" rel="noopener"&gt;https://spark.apache.org/docs/latest/sql-ref-functions.html&lt;/A&gt;' for a fix. SQLSTATE: 42605&lt;/SPAN&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;When I run something like below it works, suggesting I've prepared the dynamic string correctly, but I want to run this statement across millions of rows.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;select named_struct("HoleStatus","CompletedReclaimed","FieldDataCheckedBy","Justin Ward","WaterTabIntersected","Wet","D_SurveyedBy","Joe Bloggs","CasingCemented","No","WatertableDepth_m","cast('71' as Float)","ApprovalType","DMIRS PoW","D_SurveyMethod","GPS","D_StateName","WA","Geologist","Justin Ward","D_Commodity","Lithium","TargetDepth_m","cast('150' as Float)","ApprovalID","115501","PlugDepth_m","cast('0.4' as Float)","D_Company","Australia","DrillCompany","Ranger","CasingLeft","Yes","Cemented","No","D_OrigHoleID","SAND_2023_RC_07","CasingDepth_m","cast('6' as Float)","CasingSize","150 mm","PrecollarDepth_m","cast('10' as Float)","HolePurpose","Geological","DepthBedrock_m","cast('0' as Float)","Organisation","RTX","D_Country","AU")&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any help would be greatly appreciated.&lt;/P&gt;</description>
    <pubDate>Mon, 22 Jul 2024 08:28:02 GMT</pubDate>
    <dc:creator>Spoon_Man</dc:creator>
    <dc:date>2024-07-22T08:28:02Z</dc:date>
    <item>
      <title>Converting an Entity Attribute Value Data Model to Struct datatype</title>
      <link>https://community.databricks.com/t5/warehousing-analytics/converting-an-entity-attribute-value-data-model-to-struct/m-p/79811#M1445</link>
      <description>&lt;P&gt;I'm new to databricks and I have a source data model that stores the data as Name-Value pairs (i.e. normalised) in two columns in the table.&lt;/P&gt;&lt;TABLE border="1" width="98.20788530465948%"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD width="33.333333333333336%" height="30px"&gt;&lt;STRONG&gt;EntityID&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD width="33.333333333333336%" height="30px"&gt;&lt;STRONG&gt;Name&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD width="33.333333333333336%" height="30px"&gt;&lt;STRONG&gt;Value&lt;/STRONG&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="33.333333333333336%" height="30px"&gt;1&lt;/TD&gt;&lt;TD width="33.333333333333336%" height="30px"&gt;Field1&lt;/TD&gt;&lt;TD width="33.333333333333336%" height="30px"&gt;SomeValue1&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="33.333333333333336%" height="30px"&gt;1&lt;/TD&gt;&lt;TD width="33.333333333333336%" height="30px"&gt;Field2&lt;/TD&gt;&lt;TD width="33.333333333333336%" height="30px"&gt;SomeValue2&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD height="30px"&gt;1&lt;/TD&gt;&lt;TD height="30px"&gt;Field3&lt;/TD&gt;&lt;TD height="30px"&gt;SomeValue3&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD height="30px"&gt;2&lt;/TD&gt;&lt;TD height="30px"&gt;Field1&lt;/TD&gt;&lt;TD height="30px"&gt;SomeValue1&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD height="30px"&gt;2&lt;/TD&gt;&lt;TD height="30px"&gt;Field3&lt;/TD&gt;&lt;TD height="30px"&gt;SomeValue3&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;The definition of the NAME fields are stored in another table (in this case holedetails) which describes the data type.&lt;/P&gt;&lt;P&gt;Previously, I would use the pivot command to flatten data from this type of data model, but now with the &lt;STRONG&gt;struct&lt;/STRONG&gt; datatype in databricks, I would like to take advantage of this as it seems more efficient and a much easier format to query.&lt;/P&gt;&lt;P&gt;I would like transform the two columns into a struct column using a statement like below. I can prepare the data to be passed to the&amp;nbsp;&lt;SPAN&gt;named_struct as a string, but it fails to parse as the named_struct expects 2 or more parameters and I'm building this dynamically.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;        SELECT a.HOLEID
              ,a.PROJECTCODE
              ,named_struct(concat('(', concat_ws(',', collect_list(concat('"',NAME,'","',VALUE_MOD,'"'))),')'))
            from
      (
        SELECT a.HOLEID
              ,a.PROJECTCODE
              ,a.NAME 
              ,a.VALUE
              ,case b.DATATYPE
                  when 'Text' then VALUE
                  when 'Numeric' then concat("cast('",VALUE,"' as Float)")
              end as VALUE_MOD
        FROM rtx.raw_acq_aar.holedetails a
        join rtx.raw_acq_aar.holecode b
        on a.NAME = b.NAME
        where a.PROJECTCODE = 'SANDSTONE'
      ) a
      group by a.HOLEID
            ,a.PROJECTCODE&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I get the following error when I run the above statement as the&amp;nbsp;&lt;SPAN&gt;named_struct function is expected 2 or more arguments.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;[WRONG_NUM_ARGS.WITHOUT_SUGGESTION] The `named_struct` requires 2n (n &amp;gt; 0) parameters but the actual number is 1. Please, refer to '&lt;A href="https://spark.apache.org/docs/latest/sql-ref-functions.html" target="_blank" rel="noopener"&gt;https://spark.apache.org/docs/latest/sql-ref-functions.html&lt;/A&gt;' for a fix. SQLSTATE: 42605&lt;/SPAN&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;When I run something like below it works, suggesting I've prepared the dynamic string correctly, but I want to run this statement across millions of rows.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;select named_struct("HoleStatus","CompletedReclaimed","FieldDataCheckedBy","Justin Ward","WaterTabIntersected","Wet","D_SurveyedBy","Joe Bloggs","CasingCemented","No","WatertableDepth_m","cast('71' as Float)","ApprovalType","DMIRS PoW","D_SurveyMethod","GPS","D_StateName","WA","Geologist","Justin Ward","D_Commodity","Lithium","TargetDepth_m","cast('150' as Float)","ApprovalID","115501","PlugDepth_m","cast('0.4' as Float)","D_Company","Australia","DrillCompany","Ranger","CasingLeft","Yes","Cemented","No","D_OrigHoleID","SAND_2023_RC_07","CasingDepth_m","cast('6' as Float)","CasingSize","150 mm","PrecollarDepth_m","cast('10' as Float)","HolePurpose","Geological","DepthBedrock_m","cast('0' as Float)","Organisation","RTX","D_Country","AU")&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any help would be greatly appreciated.&lt;/P&gt;</description>
      <pubDate>Mon, 22 Jul 2024 08:28:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/warehousing-analytics/converting-an-entity-attribute-value-data-model-to-struct/m-p/79811#M1445</guid>
      <dc:creator>Spoon_Man</dc:creator>
      <dc:date>2024-07-22T08:28:02Z</dc:date>
    </item>
    <item>
      <title>Re: Converting an Entity Attribute Value Data Model to Struct datatype</title>
      <link>https://community.databricks.com/t5/warehousing-analytics/converting-an-entity-attribute-value-data-model-to-struct/m-p/79864#M1446</link>
      <description>&lt;P&gt;Having thought this through some more, I'll probably just create a struct field for each data type. I can use from_json to create the struct field.&lt;/P&gt;&lt;LI-CODE lang="python"&gt;  select HOLEID,
        PROJECTCODE,
        from_json(vf_string, 'MAP&amp;lt;STRING, STRING&amp;gt;') as vf_string
  from
    (
        SELECT HOLEID,
              PROJECTCODE,
              concat('{', concat_ws(',', collect_list(concat('"',a.NAME,'":"',a.VALUE,'"'))),'}') as vf_string
        FROM rtx.raw_acq_aar.holedetails a 
        join rtx.raw_acq_aar.holecode b
        on a.NAME = b.NAME
        where b.DATATYPE = 'Text'
        GROUP BY HOLEID, PROJECTCODE
    ) a &lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 22 Jul 2024 12:28:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/warehousing-analytics/converting-an-entity-attribute-value-data-model-to-struct/m-p/79864#M1446</guid>
      <dc:creator>Spoon_Man</dc:creator>
      <dc:date>2024-07-22T12:28:48Z</dc:date>
    </item>
    <item>
      <title>Re: Converting an Entity Attribute Value Data Model to Struct datatype</title>
      <link>https://community.databricks.com/t5/warehousing-analytics/converting-an-entity-attribute-value-data-model-to-struct/m-p/79872#M1447</link>
      <description>&lt;P&gt;Your first approach didn't work, because named_struct needs it's arguments on odd postition to be foldable.&lt;BR /&gt;So you can think of it in following way, at compile time compiler needs to "see" this value.&amp;nbsp;&lt;BR /&gt;That's why even if you prepared proper expression it didnt work&lt;/P&gt;</description>
      <pubDate>Mon, 22 Jul 2024 13:03:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/warehousing-analytics/converting-an-entity-attribute-value-data-model-to-struct/m-p/79872#M1447</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2024-07-22T13:03:47Z</dc:date>
    </item>
  </channel>
</rss>

