05-10-2022 01:54 AM
I'm facing an error in Delta Live Tables when I want to pivot a table. The error is the following:
And the code to replicate the error is the following:
import pandas as pd
import pyspark.sql.functions as F
pdf = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
"bar", "bar", "bar", "bar"],
"B": ["one", "one", "one", "two", "two",
"one", "one", "two", "two"],
"C": ["small", "large", "large", "small",
"small", "large", "small", "small",
"large"],
"D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
"E": [2, 4, 5, 5, 6, 6, 8, 9, 9]})
df = spark.createDataFrame(pdf)
df.write.mode('overwrite').saveAsTable('test_table')
import dlt
@dlt.view
def test_table():
return spark.read.table('test_table')
@dlt.table
def test_table_pivoted():
return (
spark.table('LIVE.test_table')
.groupBy('A', 'B')
.pivot('C')
.agg(F.first('D'))
)
Does anybody know why I can not pivot a table in Delta Live Tables Pipelines?
07-07-2022 06:39 AM
The solution seems to add the following configuration to the Delta Live Tables Pipeline:
spark.databricks.delta.schema.autoMerge.enabled: true
It allows "schema evolution" in the pipeline and solves the problem.
05-14-2022 08:41 AM
Can you try passing in the column names as a second argument to the pivot function?
.pivot('C', ["small", "large"])
07-07-2022 06:36 AM
Hi, this would only make the query run faster, thanks for the try. I will post below the solution I found to this issue.
07-07-2022 06:39 AM
The solution seems to add the following configuration to the Delta Live Tables Pipeline:
spark.databricks.delta.schema.autoMerge.enabled: true
It allows "schema evolution" in the pipeline and solves the problem.
10-07-2022 02:35 PM
08-27-2024 07:43 AM
I'm a bit of a muppet, it's implied but took be a second to figure out that you need to write it like this:
spark.conf.set("spark.databricks.delta.schema.autoMerge.enabled", "true")
04-19-2023 06:36 AM
It's said in the DLT documentation that "pivot" is not supported in DLT but I noticed that if you want the pivot function to work you have to do one of the the following things:
Note: I noticed that this works but you get a warning saying that `GroupedData.pivot` function that will be deprecated soon, you will have the same warning if you use Collect for instance.
Hope that help!
3 weeks ago
Hi,
Was this a specific design choice to not allow Pivots in DLT? I'm under the impression they expect fixed table structures in DLT design for a reason, but I don't understand the reason?
Conceptually, I understand the fixed structures makes lineage & quality checks easier to maintain, but is it really a hard constraint? Does applying the above solution lead to issues in the lineage views?
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group