cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Spark Settings in SQL Warehouse

najmead
Contributor

I'm running a query, trying to parse a string into a map, and I get the following error;

org.apache.spark.SparkRuntimeException: Duplicate map key  was found, please check the input data. If you want to remove the duplicated keys, you can set "spark.sql.mapKeyDedupPolicy" to "LAST_WIN" so that the key inserted at last takes precedence.

No worries, so I try to alter this setting;

set spark.sql.mapKeyDedupPolicy=LAST_WIN

And get the following error;

TStatus(statusCode:ERROR_STATUS, infoMessages:[*org.apache.hive.service.cli.HiveSQLException:Error running query: org.apache.spark.sql.AnalysisException: Configuration spark.sql.mapKeyDedupPolicy is not available.:155:154, 

This works when using a general purpose cluster, but doesn't work when using a SQL Warehouse. So how do I change this setting in a SQL Warehouse cluster? And if that's not possible, how do I get around this error?x

2 REPLIES 2

Anonymous
Not applicable

@Maryam Najafian​ :

The reason why you are getting this error is that the spark.sql.mapKeyDedupPolicy configuration is not available in SQL Warehouse clusters. This configuration is specific to Spark clusters and is not supported in SQL Warehouse.

To work around this error, you can try using an alternative approach to parsing the string into a map. One approach is to split the string into an array of key-value pairs, and then use the map_from_arrays function to convert the array into a map. Here's an example:

SELECT map_from_arrays(
    transform(
        split('key1:value1,key2:value2,key3:value3', ','), 
        x -> split(x, ':')
    )
)

In this example, we first split the input string into an array of key-value pairs using the split function. We then transform this array to create an array of arrays, where each inner array contains two elements (the key and value). Finally, we use the map_from_arrays function to create a map from the array of arrays.

Note that this approach assumes that the input string is well-formed and does not contain duplicate keys. If the input data may contain duplicate keys, you can add additional logic to handle this case, such as using the array_distinct function to remove duplicate keys before converting the array to a map.

Anonymous
Not applicable

Hi @Nicholas Mead​ 

Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.

Please help us select the best solution by clicking on "Select As Best" if it does.

Your feedback will help us ensure that we are providing the best possible service to you.

Thank you!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group