cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Warehousing & Analytics
Engage in discussions on data warehousing, analytics, and BI solutions within the Databricks Community. Share insights, tips, and best practices for leveraging data for informed decision-making.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

null object while running a query against parquet

Shaimaa
New Contributor II

I am running this query against parquet:

SELECT 
SUM(CASE WHEN match_result.ecommerce.has_online_payments THEN 1 ELSE 0 END)
FROM parquet.`s3://folder_path/*`

when all the values of the object `match_result.ecommerce` are null, I get the following error:

[INVALID_EXTRACT_BASE_FIELD_TYPE] Can't extract a value from "match_result.ecommerce". Need a complex type [STRUCT, ARRAY, MAP] but got "INT". SQLSTATE: 42000; line 2, pos 14

How can I fix this issue?

3 REPLIES 3

giuseppegrieco
New Contributor III

Without additional context, my assumption is that since all the values are null, it's impossible to determine the column's type. Do you have a default value? You could consider coalescing with that value. For example, if the default value is false:

 

SELECT 
  SUM(
    CASE WHEN COALESCE(
      CAST(match_result.ecommerce.has_online_payments AS BOOLEAN),
      FALSE
    ) THEN
      1 
    ELSE 
      0 
    END
  )
FROM
  parquet.`s3://folder_path/*`

Otherwise you can handle the null values as @Retired_mod suggested.

 

Shaimaa
New Contributor II

@Anonymous Thank you but the issue persists even if I add a null check:

SELECT 
SUM(CASE WHEN match_result IS NOT NULL AND match_result.ecommerce IS NOT NULL AND match_result.ecommerce.has_online_payments THEN 1 ELSE 0 END) 
FROM parquet.`s3://folder_path/*`

the values in this particular file for "match_result.ecommerce" are null but they are not always null for all files that I run the query against. How can I tell specify that the condition should evaluate to 0 if the value of "match_result.ecommerce" is null instead of erroring out?

Shaimaa
New Contributor II

None of these solutions with coalesce work because it's "match_result.ecommerce" that is null not "match_result.ecommerce.has_online_payments". So it's still trying to extract a value from a null. Help me modify the query accordingly please. 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group