Hi
I don't know if anyone can help me with a question about BIG QUERY and Parquet in Databricks.
We have export a field named EVENT_PARAMS from a BIG QUERY to a PARQUET table in databricks.
In BIG QUERY, we notice that this column is a STRUCT that has this composition:
ARRAY<STRUCT<key STRING, value STRUCT<string_value STRING, int_value bigint, float_value float, double_value float>>>
In PARQUET, we noticed that the data was exported with "raw" formatting, without exactly reflecting the names of the sub-fields of this STRUCT:
{"v":[{"v":{"f":[{"v":"firebase_conversion"},{"v":{"f":[{"v":null},{"v":"1"},{"v":null},{"v":null}]}}]}},{"v":{"f":[{"v":"item_list_name"},{"v":{"f":[{"v":"lista-premios"},{"v":null},{"v":null},{"v":null}]}}]}}]}
As you can see, i cannot see the name of fields inside this returno. Only the values of these fields. So itยดs very difficult to manipulate it in Pyspark or Spark SQL.
We would like to know if there is any way to translate this raw formatting, in order to make it as faithful as possible to your metadata definition.
Something that then allows you to directly mention the fields in python, pyspark or SQL and has a similar appearance to the one configured below (example):
{"v":[{{"key":"firebase_conversion"},{"value":{"f":[{"string_value":null},{"int_value":"1"},{"float_value":null},{"double_value":null}]}}},{{"key":"item_list_name"},{"value":{"f":[{"string_value":"lista_premios"},{"int_value":null},{"float_value":null},{"double_value":null}]}}}]}
Thanks for any help you can give me.
Best regards,
Sergio Coutinho