I first set up a delta live table using Python as follows.
@dlt.table
def transaction():
return (
spark
.readStream
.format("cloudFiles")
.schema(transaction_schema)
.option("cloudFiles.format", "parquet")
.load(path)
)
And I wrote the delta live table for the target database test.
{
"id": <id>,
"clusters": [
{
"label": "default",
"autoscale": {
"min_workers": 1,
"max_workers": 5
}
}
],
"development": true,
"continuous": false,
"edition": "core",
"photon": false,
"libraries": [
{
"notebook": {
"path": <path>
}
}
],
"name": "dev pipeline",
"storage": <storage>,
"target": "test"
}
Everything worked as expected in the first trial.
After a while, I noticed that I forgot to add a partition column to the table, so I dropped the table in test by DROP TABLE test.transaction, and updated the notebook to
@dlt.table(
partition_cols=["partition"],
)
def transaction():
return (
spark
.readStream
.format("cloudFiles")
.schema(transaction_schema)
.option("cloudFiles.format", "parquet")
.load(path)
.withColumn("partition", F.to_date("timestamp"))
)
However, when I reran the pipeline, I got an error.
org.apache.spark.sql.AnalysisException: Cannot change partition columns for table transaction.
Current:
Requested: partition
I can't change the partition column by only dropping the target table.
What is the proper way to change partition columns in delta live tables?
CC :- @Kit Yam Tse
Computer Science • Hong Kong University of Science and Technology