cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Set timestamp column to blank when inserting a record into delta table

naga_databricks
Contributor

I am trying to insert a record into Delta table using notebook written in python. This record has a timestamp column that should be blank initially, later i have a plan to update the timestamp value.

How am i inserting the record:

  stmt_insert_audit_record =  'insert into default.batch_run (task_name, start, end, status) values (\''+param_task_name+'\', \''+param_start+'\', \''+param_end+'\', \''+param_status+'\')'
  spark.sql(stmt_insert_audit_record)

Out of these columns, when i setup param_end as below, the insert statement works fine.

param_end = datetime.utcnow().strftime("%Y-%m-%dT%H:%M:%S")

However I do not want to set an end date. Removing the column name from insert statement will give me an error, since spark will expect all the columns mentioned. I would get an exception as `Column end is not specified in INSERT`.

How do i set the param_end value so that the Insert statement can consider a blank value?

1 ACCEPTED SOLUTION

Accepted Solutions

When i tried enclosing the param_end with double quote, i get following error:

```org.apache.spark.SparkDateTimeException: [CAST_INVALID_INPUT] The value '' of the type "STRING" cannot be cast to "TIMESTAMP" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.

Then I tried with cast function on the param_end to convert it to TIMESTAMP. This works now.

Alternatively, i have managed to create a dataframe and append it to the delta table, instead of using the spark.sql(). This is much simpler.

View solution in original post

8 REPLIES 8

etsyal1e2r3
Honored Contributor

First I would recommend using a multiline f-string in spark.sql like this...

spark.sql(f'''
    insert into default.batch_run 
        (
        task_name, 
        start, 
        end, 
        status
        ) values (
        {param_task_name},
        {param_start},
        NULL,
        {param_status}
    ''')

There are other options I havent tried if NULL doesnt work like

None

or

lit(None)

Let me know which works for you 🙂

i will certainly try out the f-string. thank you @Tyler Retzlaff​ 

Options i tried:

  1. With "None", it will not be able to concatenate with rest of the string. I get this error "can only concatenate str (not "NoneType") to str"
  2. list(None) throws an error "Column is not iterable"
  3. NULL - This will not work on spark, it will recognize it as an identifer.

Are there any other ways to specify blank?

You want to try lit(None) not list(None), did you try that?

I see you said below that youve managed to append the dataframe ehich was easier. I manaipulate dataframes as much as possible with pyspark until i need to do a join/upsert with an existing table. The pyspark method is easier to use python variables and there will be instances where you need to iterate through column names. https://www.sparkbyexamples.com/pyspark has been immensly helpful.

Let me know how it works out 🙂

I actually meant to write as lit(None). Thanks for that page. I basically needed to cast the TIMESTAMP column when formulating the spark.sql input.

Like:

end_time = ""
stmt =  "INSERT INTO default.another(msg, end_time) values('"+msg+ "', cast('"+ end_time+ "' as TIMESTAMP))"
  

Good to know, glad you figured it out

Sandeep
Contributor III

Enclose the SQL string in double quotes, assign an empty string to param_end, and on the SQL string, enclose the param_end in single quotes,

Eg:

param_end = ""
stmt_insert_audit_record =  "INSERT INTO default.batch_run (task_name, start, end, status) values ("+param_task_name+", " + param_start+ ", '"+ param_end+ "', " +param_status+ ")"
  
spark.sql(stmt_insert_audit_record)

This could help I believe.

When i tried enclosing the param_end with double quote, i get following error:

```org.apache.spark.SparkDateTimeException: [CAST_INVALID_INPUT] The value '' of the type "STRING" cannot be cast to "TIMESTAMP" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.

Then I tried with cast function on the param_end to convert it to TIMESTAMP. This works now.

Alternatively, i have managed to create a dataframe and append it to the delta table, instead of using the spark.sql(). This is much simpler.

Anonymous
Not applicable

Hi @Naga Vaibhav Elluru​ 

Elevate our community by acknowledging exceptional contributions. Your participation in marking the best answers is a testament to our collective pursuit of knowledge

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group