Databricks Community

mangel · ‎05-10-2022

I'm facing an error in Delta Live Tables when I want to pivot a table. The error is the following:

And the code to replicate the error is the following:

import pandas as pd
import pyspark.sql.functions as F
 
pdf = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
                          "bar", "bar", "bar", "bar"],
                    "B": ["one", "one", "one", "two", "two",
                          "one", "one", "two", "two"],
                    "C": ["small", "large", "large", "small",
                          "small", "large", "small", "small",
                          "large"],
                    "D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
                    "E": [2, 4, 5, 5, 6, 6, 8, 9, 9]})
df = spark.createDataFrame(pdf)
df.write.mode('overwrite').saveAsTable('test_table')
 
import dlt
 
@dlt.view
def test_table():
    return spark.read.table('test_table')
 
@dlt.table
def test_table_pivoted():
    return (
        spark.table('LIVE.test_table')
        .groupBy('A', 'B')
        .pivot('C')
        .agg(F.first('D'))
    )

Does anybody know why I can not pivot a table in Delta Live Tables Pipelines?

mangel · ‎07-07-2022

The solution seems to add the following configuration to the Delta Live Tables Pipeline:

spark.databricks.delta.schema.autoMerge.enabled: true

It allows "schema evolution" in the pipeline and solves the problem.

View solution in original post

ccary · ‎05-14-2022

Can you try passing in the column names as a second argument to the pivot function?

.pivot('C', ["small", "large"])

mangel · ‎07-07-2022

Hi, this would only make the query run faster, thanks for the try. I will post below the solution I found to this issue.

mangel · ‎07-07-2022

The solution seems to add the following configuration to the Delta Live Tables Pipeline:

spark.databricks.delta.schema.autoMerge.enabled: true

It allows "schema evolution" in the pipeline and solves the problem.

bozhu · ‎10-07-2022

According to both SQL and Python DLT documentations, "pivot" is not supported in DLT.

So I wonder what are the complications of using "pivot" in such an unsupported way?

JackyL · ‎08-27-2024

I'm a bit of a muppet, it's implied but took be a second to figure out that you need to write it like this:

spark.conf.set("spark.databricks.delta.schema.autoMerge.enabled", "true")

Khalil · ‎04-19-2023

It's said in the DLT documentation that "pivot" is not supported in DLT but I noticed that if you want the pivot function to work you have to do one of the the following things:

apply the pivot in your first a dlt.view + the config "spark.databricks.delta.schema.autoMerge.enabled: true" .
apply pivot outside of dlt decorators then start using the output in dlt.view or dlt.table.

Note: I noticed that this works but you get a warning saying that `GroupedData.pivot` function that will be deprecated soon, you will have the same warning if you use Collect for instance.

Hope that help!

Michiel_Povre · ‎12-04-2024

Hi,

Was this a specific design choice to not allow Pivots in DLT? I'm under the impression they expect fixed table structures in DLT design for a reason, but I don't understand the reason?
Conceptually, I understand the fixed structures makes lineage & quality checks easier to maintain, but is it really a hard constraint? Does applying the above solution lead to issues in the lineage views?

Databricks Community

Delta Live Tables error pivot

Photos

Connect with Databricks Users in Your Area

Databricks Learning Festival (Virtual): 15 January - 31 January 2025

Databricks Named a Leader in the 2024 Gartner® Magic Quadrant™ for Cloud Database Management Systems

Milestone: DatabricksTV Reaches 100 Videos!

Announcing the new Meta Llama 3.3 model on Databricks