cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

I first set up a delta live table using Python as follows.@dlt.table def transaction(): return ( spark .readStream .format("cloudFi...

Kaniz
Community Manager
Community Manager

I first set up a delta live table using Python as follows.

@dlt.table
def transaction():
  return (
    spark
    .readStream
    .format("cloudFiles")
    .schema(transaction_schema)
    .option("cloudFiles.format", "parquet")
    .load(path)
  )

And I wrote the delta live table for the target database test.

{
    "id": <id>,
    "clusters": [
        {
            "label": "default",
            "autoscale": {
                "min_workers": 1,
                "max_workers": 5
            }
        }
    ],
    "development": true,
    "continuous": false,
    "edition": "core",
    "photon": false,
    "libraries": [
        {
            "notebook": {
                "path": <path>
            }
        }
    ],
    "name": "dev pipeline",
    "storage": <storage>,
    "target": "test"
}

Everything worked as expected in the first trial.

After a while, I noticed that I forgot to add a partition column to the table, so I dropped the table in test by DROP TABLE test.transaction, and updated the notebook to

@dlt.table(
  partition_cols=["partition"],
)
def transaction():
  return (
    spark
    .readStream
    .format("cloudFiles")
    .schema(transaction_schema)
    .option("cloudFiles.format", "parquet")
    .load(path)
    .withColumn("partition", F.to_date("timestamp"))
  )

However, when I reran the pipeline, I got an error.

org.apache.spark.sql.AnalysisException: Cannot change partition columns for table transaction.
Current: 
Requested: partition

I can't change the partition column by only dropping the target table.

What is the proper way to change partition columns in delta live tables?

CC :- @Kit Yam Tse 

Computer Science • Hong Kong University of Science and Technology

4 REPLIES 4

RiyazAli
Contributor III

@Kaniz Fatma​ - is the error because of the partition column being created rather than using predefined column?

I'm intrigued to know the flow of execution of the dlt script written above. So as I see, once the readStream creates a df with a new column named partition then DLT would be created along with this partition?

Kaniz
Community Manager
Community Manager

Hi @Riyaz Ali​, This question has been posted on behalf of Kit Yam Tse.

RiyazAli
Contributor III

Oh okay, got it Thanks!

Hi @Kaniz Fatma​,

Did the original requestor is able to see the response? are there any follow-up questions?

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.