โ06-24-2024 10:39 AM
I am running this query against parquet:
SELECT
SUM(CASE WHEN match_result.ecommerce.has_online_payments THEN 1 ELSE 0 END)
FROM parquet.`s3://folder_path/*`
when all the values of the object `match_result.ecommerce` are null, I get the following error:
[INVALID_EXTRACT_BASE_FIELD_TYPE] Can't extract a value from "match_result.ecommerce". Need a complex type [STRUCT, ARRAY, MAP] but got "INT". SQLSTATE: 42000; line 2, pos 14
How can I fix this issue?
โ06-24-2024 12:56 PM
Hi @Shaimaa, The column "match_result.ecommerce"
is not a complex type (like a struct, array, or map), but the query is trying to extract a value from it. Since itโs not a complex type, you canโt directly access its subfields.
To address this issue, you can modify your query to handle the null values in the "match_result.ecommerce"
column. One approach is to use a CASE
statement to check for null values before attempting to extract any subfields.
โ06-24-2024 01:08 PM
Without additional context, my assumption is that since all the values are null, it's impossible to determine the column's type. Do you have a default value? You could consider coalescing with that value. For example, if the default value is false:
SELECT
SUM(
CASE WHEN COALESCE(
CAST(match_result.ecommerce.has_online_payments AS BOOLEAN),
FALSE
) THEN
1
ELSE
0
END
)
FROM
parquet.`s3://folder_path/*`
Otherwise you can handle the null values as @Kaniz_Fatma suggested.
โ06-24-2024 01:09 PM
@Anonymous Thank you but the issue persists even if I add a null check:
SELECT
SUM(CASE WHEN match_result IS NOT NULL AND match_result.ecommerce IS NOT NULL AND match_result.ecommerce.has_online_payments THEN 1 ELSE 0 END)
FROM parquet.`s3://folder_path/*`
the values in this particular file for "match_result.ecommerce" are null but they are not always null for all files that I run the query against. How can I tell specify that the condition should evaluate to 0 if the value of "match_result.ecommerce" is null instead of erroring out?
โ06-24-2024 01:42 PM
Hi @Shaimaa, I apologize for the oversight. Letโs modify the query to handle null values in a way that evaluates to 0 when "match_result.ecommerce"
is null. We can achieve this by using the COALESCE
function, which returns the first non-null value from a list of expressions
SELECT
SUM(CASE WHEN COALESCE(match_result.ecommerce.has_online_payments, 0) = 1 THEN 1 ELSE 0 END)
FROM parquet.`s3://folder_path/*`
In this modified query:
COALESCE(match_result.ecommerce.has_online_payments, 0)
expression checks if match_result.ecommerce.has_online_payments
is null. If itโs null, it returns 0; otherwise, it returns the actual value.This approach ensures that null values are treated as 0, preventing errors. Feel free to test this query, and let me know if it resolves the issue! ๐๐
If you have any further questions or need additional assistance, feel free to ask!
โ06-25-2024 04:44 AM
None of these solutions with coalesce work because it's "match_result.ecommerce" that is null not "match_result.ecommerce.has_online_payments". So it's still trying to extract a value from a null. Help me modify the query accordingly please.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group