- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-04-2023 02:36 AM
Hello everyone,
I am currently working on my first dlt pipeline, and I stumped on a problem which I am struggling to solve.
I am working on several tables where I have a column called "my_column" with an array of json with two keys : 1 key : score, 2nd key : score_name.
I want to create a column for each score, with the name of the columns being the value of the "score_name" key.
Example :
my_column = df.select("my_column").rdd.flatMAp(lambda x:x).collect()
where :
my_column = [[row("score_name" = "name1", score = 2), row("score_name" = "name2", score = 10)],
name1 = df.select("name1").rdd.flatMAp(lambda x:x).collect()
The schema of "my_column" columns is :
So what I do for that is the following :
"""To overwrite your schema or change partitioning, please set: '.option("overwriteSchema", "true")'."""
From the error message, I believe it has something to do with authorizing schema change or not using in the correct way the schema change.
If you know the solution or have any advice on how to solve this issue I would love to hear it. Thanks !
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-05-2023 09:29 AM
Thank you for your answer! I found a way to complete the pipeline; I had to use spark_conf = {"spark.databricks.delta.schema.autoMerge.enabled": "true"}, inside the decorator dlt.table of my table.
I still don't know exactly why the schema change here creates an error. When I create new columns through other pathways, such as with the method from_json, I don't get similar errors.
I will also try to follow your advice and see if it works or if there is a difference!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-05-2023 09:29 AM
Thank you for your answer! I found a way to complete the pipeline; I had to use spark_conf = {"spark.databricks.delta.schema.autoMerge.enabled": "true"}, inside the decorator dlt.table of my table.
I still don't know exactly why the schema change here creates an error. When I create new columns through other pathways, such as with the method from_json, I don't get similar errors.
I will also try to follow your advice and see if it works or if there is a difference!

