Databricks Community

StackP · ‎07-12-2022

In Databricks I have a existing delta table, In which i want to add one more column, as Id so that each row has unique id no and It is consecutive (how primary key is present in sql).

So far I have tried converting delta table to pyspark dataframe and have added new column as

from pyspark.sql.window import Window as W

from pyspark.sql import functions as F

df1 = df1.withColumn("idx", F.monotonically_increasing_id())

windowSpec = W.orderBy("idx")

F.row_number().over(windowSpec)).show()

I tried writing it back to delta table,

df.write.mode("append").format("delta").save(location/db.tablename)

It writes back but the data values After querying is null for the new id column. I read, overwrite mode will erase all previous data. How can I bring the column id data to delta table and keep incrementing the id column when data gets inserted?

I am trying to achieve adding a autoincrement column for delta table. The databricks runtime is 7.3.