cancel
Showing results for 
Search instead for 
Did you mean: 
Warehousing & Analytics
Engage in discussions on data warehousing, analytics, and BI solutions within the Databricks Community. Share insights, tips, and best practices for leveraging data for informed decision-making.
cancel
Showing results for 
Search instead for 
Did you mean: 

null object while running a query against parquet

Shaimaa
New Contributor II

I am running this query against parquet:

SELECT 
SUM(CASE WHEN match_result.ecommerce.has_online_payments THEN 1 ELSE 0 END)
FROM parquet.`s3://folder_path/*`

when all the values of the object `match_result.ecommerce` are null, I get the following error:

[INVALID_EXTRACT_BASE_FIELD_TYPE] Can't extract a value from "match_result.ecommerce". Need a complex type [STRUCT, ARRAY, MAP] but got "INT". SQLSTATE: 42000; line 2, pos 14

How can I fix this issue?

3 REPLIES 3

giuseppegrieco
New Contributor III

Without additional context, my assumption is that since all the values are null, it's impossible to determine the column's type. Do you have a default value? You could consider coalescing with that value. For example, if the default value is false:

 

SELECT 
  SUM(
    CASE WHEN COALESCE(
      CAST(match_result.ecommerce.has_online_payments AS BOOLEAN),
      FALSE
    ) THEN
      1 
    ELSE 
      0 
    END
  )
FROM
  parquet.`s3://folder_path/*`

Otherwise you can handle the null values as @Retired_mod suggested.

 

Shaimaa
New Contributor II

@Anonymous Thank you but the issue persists even if I add a null check:

SELECT 
SUM(CASE WHEN match_result IS NOT NULL AND match_result.ecommerce IS NOT NULL AND match_result.ecommerce.has_online_payments THEN 1 ELSE 0 END) 
FROM parquet.`s3://folder_path/*`

the values in this particular file for "match_result.ecommerce" are null but they are not always null for all files that I run the query against. How can I tell specify that the condition should evaluate to 0 if the value of "match_result.ecommerce" is null instead of erroring out?

Shaimaa
New Contributor II

None of these solutions with coalesce work because it's "match_result.ecommerce" that is null not "match_result.ecommerce.has_online_payments". So it's still trying to extract a value from a null. Help me modify the query accordingly please. 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group