Databricks Community

mangel · ‎05-10-2022

I'm facing an error in Delta Live Tables when I want to pivot a table. The error is the following:

And the code to replicate the error is the following:

import pandas as pd
import pyspark.sql.functions as F
 
pdf = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
                          "bar", "bar", "bar", "bar"],
                    "B": ["one", "one", "one", "two", "two",
                          "one", "one", "two", "two"],
                    "C": ["small", "large", "large", "small",
                          "small", "large", "small", "small",
                          "large"],
                    "D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
                    "E": [2, 4, 5, 5, 6, 6, 8, 9, 9]})
df = spark.createDataFrame(pdf)
df.write.mode('overwrite').saveAsTable('test_table')
 
import dlt
 
@dlt.view
def test_table():
    return spark.read.table('test_table')
 
@dlt.table
def test_table_pivoted():
    return (
        spark.table('LIVE.test_table')
        .groupBy('A', 'B')
        .pivot('C')
        .agg(F.first('D'))
    )

Does anybody know why I can not pivot a table in Delta Live Tables Pipelines?

mangel · ‎07-07-2022

The solution seems to add the following configuration to the Delta Live Tables Pipeline:

spark.databricks.delta.schema.autoMerge.enabled: true

It allows "schema evolution" in the pipeline and solves the problem.

View solution in original post

ccary · ‎05-14-2022

Can you try passing in the column names as a second argument to the pivot function?

.pivot('C', ["small", "large"])

mangel · ‎07-07-2022

Hi, this would only make the query run faster, thanks for the try. I will post below the solution I found to this issue.

mangel · ‎07-07-2022

The solution seems to add the following configuration to the Delta Live Tables Pipeline:

spark.databricks.delta.schema.autoMerge.enabled: true

It allows "schema evolution" in the pipeline and solves the problem.

bozhu · ‎10-07-2022

According to both SQL and Python DLT documentations, "pivot" is not supported in DLT.

So I wonder what are the complications of using "pivot" in such an unsupported way?

JackyL · ‎08-27-2024

I'm a bit of a muppet, it's implied but took be a second to figure out that you need to write it like this:

spark.conf.set("spark.databricks.delta.schema.autoMerge.enabled", "true")

Khalil · ‎04-19-2023

It's said in the DLT documentation that "pivot" is not supported in DLT but I noticed that if you want the pivot function to work you have to do one of the the following things:

apply the pivot in your first a dlt.view + the config "spark.databricks.delta.schema.autoMerge.enabled: true" .
apply pivot outside of dlt decorators then start using the output in dlt.view or dlt.table.

Note: I noticed that this works but you get a warning saying that `GroupedData.pivot` function that will be deprecated soon, you will have the same warning if you use Collect for instance.

Hope that help!

Michiel_Povre · ‎12-04-2024

Hi,

Was this a specific design choice to not allow Pivots in DLT? I'm under the impression they expect fixed table structures in DLT design for a reason, but I don't understand the reason?
Conceptually, I understand the fixed structures makes lineage & quality checks easier to maintain, but is it really a hard constraint? Does applying the above solution lead to issues in the lineage views?