topic Re: Spark Settings in SQL Warehouse in Data Engineering

Spark Settings in SQL Warehouse

najmead — Fri, 10 Mar 2023 11:30:27 GMT

I'm running a query, trying to parse a string into a map, and I get the following error;

org.apache.spark.SparkRuntimeException: Duplicate map key  was found, please check the input data. If you want to remove the duplicated keys, you can set "spark.sql.mapKeyDedupPolicy" to "LAST_WIN" so that the key inserted at last takes precedence.

No worries, so I try to alter this setting;

set spark.sql.mapKeyDedupPolicy=LAST_WIN

And get the following error;

TStatus(statusCode:ERROR_STATUS, infoMessages:[*org.apache.hive.service.cli.HiveSQLException:Error running query: org.apache.spark.sql.AnalysisException: Configuration spark.sql.mapKeyDedupPolicy is not available.:155:154,

This works when using a general purpose cluster, but doesn't work when using a SQL Warehouse. So how do I change this setting in a SQL Warehouse cluster? And if that's not possible, how do I get around this error?x

Re: Spark Settings in SQL Warehouse

Anonymous — Fri, 17 Mar 2023 15:46:08 GMT

@Maryam Najafian :

The reason why you are getting this error is that the spark.sql.mapKeyDedupPolicy configuration is not available in SQL Warehouse clusters. This configuration is specific to Spark clusters and is not supported in SQL Warehouse.

To work around this error, you can try using an alternative approach to parsing the string into a map. One approach is to split the string into an array of key-value pairs, and then use the map_from_arrays function to convert the array into a map. Here's an example:

SELECT map_from_arrays(
    transform(
        split('key1:value1,key2:value2,key3:value3', ','), 
        x -> split(x, ':')
    )
)

In this example, we first split the input string into an array of key-value pairs using the split function. We then transform this array to create an array of arrays, where each inner array contains two elements (the key and value). Finally, we use the map_from_arrays function to create a map from the array of arrays.

Note that this approach assumes that the input string is well-formed and does not contain duplicate keys. If the input data may contain duplicate keys, you can add additional logic to handle this case, such as using the array_distinct function to remove duplicate keys before converting the array to a map.

Re: Spark Settings in SQL Warehouse

Anonymous — Sat, 01 Apr 2023 00:15:38 GMT

Hi @Nicholas Mead

Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.

Please help us select the best solution by clicking on "Select As Best" if it does.

Your feedback will help us ensure that we are providing the best possible service to you.

Thank you!