Converting an Entity Attribute Value Data Model to Struct datatype

Spoon_Man — Mon, 22 Jul 2024 08:28:02 GMT

I'm new to databricks and I have a source data model that stores the data as Name-Value pairs (i.e. normalised) in two columns in the table.

EntityID	Name	Value
1	Field1	SomeValue1
1	Field2	SomeValue2
1	Field3	SomeValue3
2	Field1	SomeValue1
2	Field3	SomeValue3

The definition of the NAME fields are stored in another table (in this case holedetails) which describes the data type.

Previously, I would use the pivot command to flatten data from this type of data model, but now with the struct datatype in databricks, I would like to take advantage of this as it seems more efficient and a much easier format to query.

I would like transform the two columns into a struct column using a statement like below. I can prepare the data to be passed to the named_struct as a string, but it fails to parse as the named_struct expects 2 or more parameters and I'm building this dynamically.

SELECT a.HOLEID ,a.PROJECTCODE ,named_struct(concat('(', concat_ws(',', collect_list(concat('"',NAME,'","',VALUE_MOD,'"'))),')')) from ( SELECT a.HOLEID ,a.PROJECTCODE ,a.NAME ,a.VALUE ,case b.DATATYPE when 'Text' then VALUE when 'Numeric' then concat("cast('",VALUE,"' as Float)") end as VALUE_MOD FROM rtx.raw_acq_aar.holedetails a join rtx.raw_acq_aar.holecode b on a.NAME = b.NAME where a.PROJECTCODE = 'SANDSTONE' ) a group by a.HOLEID ,a.PROJECTCODE

I get the following error when I run the above statement as the named_struct function is expected 2 or more arguments.

[WRONG_NUM_ARGS.WITHOUT_SUGGESTION] The `named_struct` requires 2n (n > 0) parameters but the actual number is 1. Please, refer to 'https://spark.apache.org/docs/latest/sql-ref-functions.html' for a fix. SQLSTATE: 42605

When I run something like below it works, suggesting I've prepared the dynamic string correctly, but I want to run this statement across millions of rows.

select named_struct("HoleStatus","CompletedReclaimed","FieldDataCheckedBy","Justin Ward","WaterTabIntersected","Wet","D_SurveyedBy","Joe Bloggs","CasingCemented","No","WatertableDepth_m","cast('71' as Float)","ApprovalType","DMIRS PoW","D_SurveyMethod","GPS","D_StateName","WA","Geologist","Justin Ward","D_Commodity","Lithium","TargetDepth_m","cast('150' as Float)","ApprovalID","115501","PlugDepth_m","cast('0.4' as Float)","D_Company","Australia","DrillCompany","Ranger","CasingLeft","Yes","Cemented","No","D_OrigHoleID","SAND_2023_RC_07","CasingDepth_m","cast('6' as Float)","CasingSize","150 mm","PrecollarDepth_m","cast('10' as Float)","HolePurpose","Geological","DepthBedrock_m","cast('0' as Float)","Organisation","RTX","D_Country","AU")

Any help would be greatly appreciated.

Re: Converting an Entity Attribute Value Data Model to Struct datatype

Spoon_Man — Mon, 22 Jul 2024 12:28:48 GMT

Having thought this through some more, I'll probably just create a struct field for each data type. I can use from_json to create the struct field.

select HOLEID, PROJECTCODE, from_json(vf_string, 'MAP<STRING, STRING>') as vf_string from ( SELECT HOLEID, PROJECTCODE, concat('{', concat_ws(',', collect_list(concat('"',a.NAME,'":"',a.VALUE,'"'))),'}') as vf_string FROM rtx.raw_acq_aar.holedetails a join rtx.raw_acq_aar.holecode b on a.NAME = b.NAME where b.DATATYPE = 'Text' GROUP BY HOLEID, PROJECTCODE ) a

Re: Converting an Entity Attribute Value Data Model to Struct datatype

szymon_dybczak — Mon, 22 Jul 2024 13:03:47 GMT

Your first approach didn't work, because named_struct needs it's arguments on odd postition to be foldable.
So you can think of it in following way, at compile time compiler needs to "see" this value.
That's why even if you prepared proper expression it didnt work

topic Converting an Entity Attribute Value Data Model to Struct datatype in Warehousing & Analytics

Converting an Entity Attribute Value Data Model to Struct datatype

Re: Converting an Entity Attribute Value Data Model to Struct datatype

Re: Converting an Entity Attribute Value Data Model to Struct datatype