cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Arrays of complex type always evaluate to ARRAY<STRING>?

pt-jake
New Contributor II

Arrays of complex types seemingly always evaluate to ARRAY<STRING>. Therefore, casting or attempting to load JSON data with empty array values fails. For example, attempting to cast a JSON value of {"likes": []...} on load to the following table schema fail with the following error:

Schema:

array<struct<gid:string,user:struct<gid:string,name:string,resource_type:string>>>

Error:

"Error copy: [DATATYPE_MISMATCH.CAST_WITHOUT_SUGGESTION] Cannot resolve "likes" due to data type mismatch: cannot cast "ARRAY<STRING>" to "ARRAY<STRUCT<gid: STRING, user: STRUCT<gid: STRING, name: STRING, resource_type: STRING>>>"

What is the best way to handle empty/null complex-typed arrays?

2 REPLIES 2

Kaniz
Community Manager
Community Manager

Hi @Jake Neyer​, The error occurs because the system cannot determine the structure of the empty array and assumes it to be an ARRAY<STRING>, which doesn't match the expected schema.

To handle empty/null complex-typed arrays, you can use a JSON input preprocessing step or modify your query to handle the empty array scenario.

  • JSON input preprocessing:

Before loading the JSON data into your table, you can preprocess it to ensure that empty arrays have at least one element with null values for each field, which will help maintain the schema.

For example, given the JSON object:

{
  "likes": []
}

Preprocess it to:

{
  "likes": [{"gid": null, "user": {"gid": null, "name": null, "resource_type": null}}]
}
  • Modify your query to handle empty arrays:

You can use a CASE statement in your query to handle empty arrays while casting or loading the data. Here's an example query using BigQuery SQL syntax:

SELECT
  IF(ARRAY_LENGTH(likes) > 0,
    ARRAY<STRUCT<gid: STRING, user: STRUCT<gid: STRING, name: STRING, resource_type: STRING>>>(likes),
    ARRAY<STRUCT<gid: STRING, user: STRUCT<gid: STRING, name: STRING, resource_type: STRING>>>[STRUCT(null, STRUCT(null, null, null))])
FROM
  your_table_name

This query checks the length of the "likes" array. If it is non-empty, it casts the array as-is.

If empty, it creates an array with a single null-filled element that matches the expected schema. Note that the syntax may vary depending on your SQL dialect.

Choose the approach that best suits your requirements and the tools you are using.

Preprocessing the JSON data can help maintain the schema consistency while modifying the query allows you to handle empty arrays directly in the SQL code.

Anonymous
Not applicable

Hi @Jake Neyer​ 

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance! 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.