map_keys() returns an empty array in Delta Live Table pipeline.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-04-2023 03:11 AM
We are exploding a map type column into multiple columns based on the keys of the map column. Part of this process is to extract the keys of a map type column called json_map as illustrated in the snippet below. The code executes as expected when running it in a notebook, but returns an empty array when running it in a Delta Live Tables pipeline. Below is the snippet of code:
keys = (
df
.select(map_keys("json_map"))
.distinct()
.collect()
)
Does anyone know why this code will run as expected in notebook, but returns empty array in the Delta Live Tables pipeline? Or is there another method to extract the different fields of a map or json column to separate columns?
- Labels:
-
Delta
-
Table Pipeline
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-09-2023 08:34 AM
@De Vos Meaker :
One potential reason why the code works in a notebook but returns an empty array in a Delta Live Tables pipeline is that there may be differences in the data being processed. It's possible that the pipeline is processing different data that doesn't have any keys in the json_map column, leading to an empty array result.
As for alternative methods to extract the different fields of a map or json column to separate columns, you can try using the
getItem
function in PySpark. Here's an example code snippet:
from pyspark.sql.functions import col
df = df.select(
col("json_map").getItem("key1").alias("column1"),
col("json_map").getItem("key2").alias("column2"),
col("json_map").getItem("key3").alias("column3")
)
This code creates new columns column1, column2, and column3 by extracting the values of the keys "key1", "key2", and "key3" from the json_map column using the getItem function. You can customize this code to extract the specific keys you need.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-17-2023 06:13 AM
Hi @Suteja Kanuri ,
Thank you for you response and explanation. The code I have shown above is not the exact snippet we are using. Please find the exact snippet below. We are dynamically extracting the keys of the map and then using getitem() to make columns from the fields, with the key names the names of the columns:
df = df.withColumn(column_name, from_json(column_name, MapType(StringType(),StringType())))
keys = (
df
.select(map_keys(column_name))
.distinct()
.collect()
)
df = df.select(
[col(column_name).getItem(k).alias(k) for k in keys] + [filter_column]
)
I have checked the data, and its identical. Do you know if the dynamic way is supported in Delta Live Tables?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-17-2023 06:22 AM
@De Vos Meaker :
Since Delta Live Tables is built on top of Delta Lake, which is designed to work with Apache Spark, the dynamic way of extracting keys of a map and making columns from the fields using getItem()
should be supported in Delta Live Tables. However, please test your code to see if it works as expected.

