- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-10-2022 01:54 AM
I'm facing an error in Delta Live Tables when I want to pivot a table. The error is the following:
And the code to replicate the error is the following:
import pandas as pd
import pyspark.sql.functions as F
pdf = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
"bar", "bar", "bar", "bar"],
"B": ["one", "one", "one", "two", "two",
"one", "one", "two", "two"],
"C": ["small", "large", "large", "small",
"small", "large", "small", "small",
"large"],
"D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
"E": [2, 4, 5, 5, 6, 6, 8, 9, 9]})
df = spark.createDataFrame(pdf)
df.write.mode('overwrite').saveAsTable('test_table')
import dlt
@dlt.view
def test_table():
return spark.read.table('test_table')
@dlt.table
def test_table_pivoted():
return (
spark.table('LIVE.test_table')
.groupBy('A', 'B')
.pivot('C')
.agg(F.first('D'))
)
Does anybody know why I can not pivot a table in Delta Live Tables Pipelines?
- Labels:
-
Delta Live Tables
-
Pivot Function
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-07-2022 06:39 AM
The solution seems to add the following configuration to the Delta Live Tables Pipeline:
spark.databricks.delta.schema.autoMerge.enabled: true
It allows "schema evolution" in the pipeline and solves the problem.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-14-2022 08:41 AM
Can you try passing in the column names as a second argument to the pivot function?
.pivot('C', ["small", "large"])
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-07-2022 06:36 AM
Hi, this would only make the query run faster, thanks for the try. I will post below the solution I found to this issue.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-07-2022 06:39 AM
The solution seems to add the following configuration to the Delta Live Tables Pipeline:
spark.databricks.delta.schema.autoMerge.enabled: true
It allows "schema evolution" in the pipeline and solves the problem.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-07-2022 02:35 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-27-2024 07:43 AM
I'm a bit of a muppet, it's implied but took be a second to figure out that you need to write it like this:
spark.conf.set("spark.databricks.delta.schema.autoMerge.enabled", "true")
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-19-2023 06:36 AM
It's said in the DLT documentation that "pivot" is not supported in DLT but I noticed that if you want the pivot function to work you have to do one of the the following things:
- apply the pivot in your first a dlt.view + the config "spark.databricks.delta.schema.autoMerge.enabled: true" .
- apply pivot outside of dlt decorators then start using the output in dlt.view or dlt.table.
Note: I noticed that this works but you get a warning saying that `GroupedData.pivot` function that will be deprecated soon, you will have the same warning if you use Collect for instance.
Hope that help!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-04-2024 02:39 AM
Hi,
Was this a specific design choice to not allow Pivots in DLT? I'm under the impression they expect fixed table structures in DLT design for a reason, but I don't understand the reason?
Conceptually, I understand the fixed structures makes lineage & quality checks easier to maintain, but is it really a hard constraint? Does applying the above solution lead to issues in the lineage views?