05-10-2022 01:54 AM
I'm facing an error in Delta Live Tables when I want to pivot a table. The error is the following:
And the code to replicate the error is the following:
import pandas as pd
import pyspark.sql.functions as F
pdf = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
"bar", "bar", "bar", "bar"],
"B": ["one", "one", "one", "two", "two",
"one", "one", "two", "two"],
"C": ["small", "large", "large", "small",
"small", "large", "small", "small",
"large"],
"D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
"E": [2, 4, 5, 5, 6, 6, 8, 9, 9]})
df = spark.createDataFrame(pdf)
df.write.mode('overwrite').saveAsTable('test_table')
import dlt
@dlt.view
def test_table():
return spark.read.table('test_table')
@dlt.table
def test_table_pivoted():
return (
spark.table('LIVE.test_table')
.groupBy('A', 'B')
.pivot('C')
.agg(F.first('D'))
)
Does anybody know why I can not pivot a table in Delta Live Tables Pipelines?
07-07-2022 06:39 AM
The solution seems to add the following configuration to the Delta Live Tables Pipeline:
spark.databricks.delta.schema.autoMerge.enabled: true
It allows "schema evolution" in the pipeline and solves the problem.
05-14-2022 08:41 AM
Can you try passing in the column names as a second argument to the pivot function?
.pivot('C', ["small", "large"])
07-07-2022 06:36 AM
Hi, this would only make the query run faster, thanks for the try. I will post below the solution I found to this issue.
07-07-2022 06:39 AM
The solution seems to add the following configuration to the Delta Live Tables Pipeline:
spark.databricks.delta.schema.autoMerge.enabled: true
It allows "schema evolution" in the pipeline and solves the problem.
10-07-2022 02:35 PM
08-27-2024 07:43 AM
I'm a bit of a muppet, it's implied but took be a second to figure out that you need to write it like this:
spark.conf.set("spark.databricks.delta.schema.autoMerge.enabled", "true")
04-19-2023 06:36 AM
It's said in the DLT documentation that "pivot" is not supported in DLT but I noticed that if you want the pivot function to work you have to do one of the the following things:
Note: I noticed that this works but you get a warning saying that `GroupedData.pivot` function that will be deprecated soon, you will have the same warning if you use Collect for instance.
Hope that help!
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group