Hi @Jake Neyer, The error occurs because the system cannot determine the structure of the empty array and assumes it to be an ARRAY<STRING>, which doesn't match the expected schema.
To handle empty/null complex-typed arrays, you can use a JSON input preprocessing step or modify your query to handle the empty array scenario.
- JSON input preprocessing:
Before loading the JSON data into your table, you can preprocess it to ensure that empty arrays have at least one element with null values for each field, which will help maintain the schema.
For example, given the JSON object:
{
"likes": []
}
Preprocess it to:
{
"likes": [{"gid": null, "user": {"gid": null, "name": null, "resource_type": null}}]
}
- Modify your query to handle empty arrays:
You can use a CASE statement in your query to handle empty arrays while casting or loading the data. Here's an example query using BigQuery SQL syntax:
SELECT
IF(ARRAY_LENGTH(likes) > 0,
ARRAY<STRUCT<gid: STRING, user: STRUCT<gid: STRING, name: STRING, resource_type: STRING>>>(likes),
ARRAY<STRUCT<gid: STRING, user: STRUCT<gid: STRING, name: STRING, resource_type: STRING>>>[STRUCT(null, STRUCT(null, null, null))])
FROM
your_table_name
This query checks the length of the "likes" array. If it is non-empty, it casts the array as-is.
If empty, it creates an array with a single null-filled element that matches the expected schema. Note that the syntax may vary depending on your SQL dialect.
Choose the approach that best suits your requirements and the tools you are using.
Preprocessing the JSON data can help maintain the schema consistency while modifying the query allows you to handle empty arrays directly in the SQL code.