cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Using current_timestamp as a default value in a delta table

deng77
New Contributor III

I want to add a column to an existing delta table with a timestamp for when the data was inserted. I know I can do this by including current_timestamp with my SQL statement that inserts into the table.

Is it possible to add a column to an existing delta table with a default value of current_timestamp so I don't have to include the timestamp when writing data to the table? I have tried doing it but it doesn't seem to populate the column when I insert into the table.

Here is what I have tried so far:

ALTER TABLE tableName
ADD COLUMN InsertUTC timestamp DEFAULT current_timestamp

1 ACCEPTED SOLUTION

Accepted Solutions

elgeo
Valued Contributor II

Hello - Just an update from the latest office hours. They mentioned that default value is available for Delta tables as well starting from release 12.2 Databricks Runtime 11.3 LTS | Databricks on AWS. However, I was able to set it up only in the following way. It wasn't allowed in the Create statement directly.

imageimageimageimageimage.png

View solution in original post

11 REPLIES 11

Hubert-Dudek
Esteemed Contributor III

It was introduced in 11.2, and there is a note:

This behavior is supported for CSV, JSON, Orc, and Parquet data sources.

So it seems not yet for Delta. As for the parquet, I've just tested it, and it is ok. So I hope that support for Delta will be soon added.

image.png 

deng77
New Contributor III

Thanks for the response Hubert, hopefully the functionality for delta tables will be added soon.

elgeo
Valued Contributor II

Hello - Just an update from the latest office hours. They mentioned that default value is available for Delta tables as well starting from release 12.2 Databricks Runtime 11.3 LTS | Databricks on AWS. However, I was able to set it up only in the following way. It wasn't allowed in the Create statement directly.

imageimageimageimageimage.png

deng77
New Contributor III

Thanks for this, I tried it out on my end with a cluster running DBR 12.2 LTS and I was able to get it to work.

CREATE TABLE test (id int, time timestamp);
ALTER TABLE test
SET TBLPROPERTIES('delta.minReaderVersion' = '1', 'delta.minWriterVersion'='7');
ALTER TABLE test SET TBLPROPERTIES('delta.feature.allowColumnDefaults' = 'enabled');
ALTER TABLE test ALTER COLUMN time SET DEFAULT CURRENT_TIMESTAMP()

image.png

Danpeco
New Contributor II

My time column would have the default value of current_timestamp() if I went to the table properties after doing the identical action in the same DBR 12.2 LTS runtime.

However, it would fail if I attempted to add data to a target table that had one more column (the default time column) than the values I was adding to it : Cannot write to 'spark_catalog.gold.xxxxxxx', not enough data columns; target table has 19 column(s) but the inserted data has 18 column(s).

Do you know why this might be occurring?

deng77
New Contributor III

How are you adding values to the table? I have encountered that issue previously when I didn't specify columns in an INSERT statement where the table I was writing to had an identity column that is automatically generated.

iamkt23
New Contributor II

Hi All,

I am unable to run this 
ALTER TABLE TABLE_NAME alter column column_name set default 'value'
as shown in the solution. 

I was able to run the previous queries. But here, I am getting an error. 

db.jpg

Can someone suggest a solution, I have a table with 20+ columns and I want to set '0' as default for majority of them.

pvignesh92
Honored Contributor
-- Alter the table to use the GENERATED ALWAYS functionality for the created_at column
ALTER TABLE example_table
ADD COLUMN created_at TIMESTAMP GENERATED ALWAYS AS CURRENT_TIMESTAMP();

@Michael Burch​ Hi , Did you try using GENERATED ALWAYS feature. That can do satisfy your requirements I believe. With this, you don't need to insert a separate value for this column every time you load data into the table.

deng77
New Contributor III

That doesn't work for me, I get the following error:

[PARSE_SYNTAX_ERROR] Syntax error at or near 'GENERATED'(line 2, pos 32).

It sounds like delta lake generated columns have to be based on other columns in the table:

Delta Lake supports generated columns which are a special type of column whose values are automatically generated based on a user-specified function over other columns in the Delta table.

https://docs.databricks.com/delta/generated-columns.html

iamkt23
New Contributor II

Did this work for you in a databricks notebook?

Vaibhav1000
New Contributor II

Can you please provide information on the additional expenses related to using this feature compared to not utilizing it at all?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group