I'm new to databricks and I have a source data model that stores the data as Name-Value pairs (i.e. normalised) in two columns in the table.
EntityID | Name | Value |
1 | Field1 | SomeValue1 |
1 | Field2 | SomeValue2 |
1 | Field3 | SomeValue3 |
2 | Field1 | SomeValue1 |
2 | Field3 | SomeValue3 |
The definition of the NAME fields are stored in another table (in this case holedetails) which describes the data type.
Previously, I would use the pivot command to flatten data from this type of data model, but now with the struct datatype in databricks, I would like to take advantage of this as it seems more efficient and a much easier format to query.
I would like transform the two columns into a struct column using a statement like below. I can prepare the data to be passed to the named_struct as a string, but it fails to parse as the named_struct expects 2 or more parameters and I'm building this dynamically.
SELECT a.HOLEID
,a.PROJECTCODE
,named_struct(concat('(', concat_ws(',', collect_list(concat('"',NAME,'","',VALUE_MOD,'"'))),')'))
from
(
SELECT a.HOLEID
,a.PROJECTCODE
,a.NAME
,a.VALUE
,case b.DATATYPE
when 'Text' then VALUE
when 'Numeric' then concat("cast('",VALUE,"' as Float)")
end as VALUE_MOD
FROM rtx.raw_acq_aar.holedetails a
join rtx.raw_acq_aar.holecode b
on a.NAME = b.NAME
where a.PROJECTCODE = 'SANDSTONE'
) a
group by a.HOLEID
,a.PROJECTCODE
I get the following error when I run the above statement as the named_struct function is expected 2 or more arguments.
[WRONG_NUM_ARGS.WITHOUT_SUGGESTION] The `named_struct` requires 2n (n > 0) parameters but the actual number is 1. Please, refer to 'https://spark.apache.org/docs/latest/sql-ref-functions.html' for a fix. SQLSTATE: 42605
When I run something like below it works, suggesting I've prepared the dynamic string correctly, but I want to run this statement across millions of rows.
select named_struct("HoleStatus","CompletedReclaimed","FieldDataCheckedBy","Justin Ward","WaterTabIntersected","Wet","D_SurveyedBy","Joe Bloggs","CasingCemented","No","WatertableDepth_m","cast('71' as Float)","ApprovalType","DMIRS PoW","D_SurveyMethod","GPS","D_StateName","WA","Geologist","Justin Ward","D_Commodity","Lithium","TargetDepth_m","cast('150' as Float)","ApprovalID","115501","PlugDepth_m","cast('0.4' as Float)","D_Company","Australia","DrillCompany","Ranger","CasingLeft","Yes","Cemented","No","D_OrigHoleID","SAND_2023_RC_07","CasingDepth_m","cast('6' as Float)","CasingSize","150 mm","PrecollarDepth_m","cast('10' as Float)","HolePurpose","Geological","DepthBedrock_m","cast('0' as Float)","Organisation","RTX","D_Country","AU")
Any help would be greatly appreciated.