Databricks Community

naga_databricks · ‎06-14-2023

I am trying to insert a record into Delta table using notebook written in python. This record has a timestamp column that should be blank initially, later i have a plan to update the timestamp value.

How am i inserting the record:

  stmt_insert_audit_record =  'insert into default.batch_run (task_name, start, end, status) values (\''+param_task_name+'\', \''+param_start+'\', \''+param_end+'\', \''+param_status+'\')'
  spark.sql(stmt_insert_audit_record)

Out of these columns, when i setup param_end as below, the insert statement works fine.

param_end = datetime.utcnow().strftime("%Y-%m-%dT%H:%M:%S")

However I do not want to set an end date. Removing the column name from insert statement will give me an error, since spark will expect all the columns mentioned. I would get an exception as `Column end is not specified in INSERT`.

How do i set the param_end value so that the Insert statement can consider a blank value?

naga_databricks · ‎06-14-2023

When i tried enclosing the param_end with double quote, i get following error:

```org.apache.spark.SparkDateTimeException: [CAST_INVALID_INPUT] The value '' of the type "STRING" cannot be cast to "TIMESTAMP" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.

Then I tried with cast function on the param_end to convert it to TIMESTAMP. This works now.

Alternatively, i have managed to create a dataframe and append it to the delta table, instead of using the spark.sql(). This is much simpler.

View solution in original post

etsyal1e2r3 · ‎06-14-2023

First I would recommend using a multiline f-string in spark.sql like this...

spark.sql(f'''
    insert into default.batch_run 
        (
        task_name, 
        start, 
        end, 
        status
        ) values (
        {param_task_name},
        {param_start},
        NULL,
        {param_status}
    ''')

There are other options I havent tried if NULL doesnt work like

None

or

lit(None)

Let me know which works for you 🙂

naga_databricks · ‎06-14-2023

i will certainly try out the f-string. thank you @Tyler Retzlaff

Options i tried:

With "None", it will not be able to concatenate with rest of the string. I get this error "can only concatenate str (not "NoneType") to str"
list(None) throws an error "Column is not iterable"
NULL - This will not work on spark, it will recognize it as an identifer.

Are there any other ways to specify blank?

etsyal1e2r3 · ‎06-14-2023

You want to try lit(None) not list(None), did you try that?

I see you said below that youve managed to append the dataframe ehich was easier. I manaipulate dataframes as much as possible with pyspark until i need to do a join/upsert with an existing table. The pyspark method is easier to use python variables and there will be instances where you need to iterate through column names. https://www.sparkbyexamples.com/pyspark has been immensly helpful.

Let me know how it works out 🙂

naga_databricks · ‎06-15-2023

I actually meant to write as lit(None). Thanks for that page. I basically needed to cast the TIMESTAMP column when formulating the spark.sql input.

Like:

end_time = ""
stmt =  "INSERT INTO default.another(msg, end_time) values('"+msg+ "', cast('"+ end_time+ "' as TIMESTAMP))"

etsyal1e2r3 · ‎06-15-2023

Good to know, glad you figured it out

Sandeep · ‎06-14-2023

Enclose the SQL string in double quotes, assign an empty string to param_end, and on the SQL string, enclose the param_end in single quotes,

Eg:

param_end = ""

stmt_insert_audit_record =  "INSERT INTO default.batch_run (task_name, start, end, status) values ("+param_task_name+", " + param_start+ ", '"+ param_end+ "', " +param_status+ ")"
  
spark.sql(stmt_insert_audit_record)

This could help I believe.

naga_databricks · ‎06-14-2023

When i tried enclosing the param_end with double quote, i get following error:

```org.apache.spark.SparkDateTimeException: [CAST_INVALID_INPUT] The value '' of the type "STRING" cannot be cast to "TIMESTAMP" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.

Then I tried with cast function on the param_end to convert it to TIMESTAMP. This works now.

Alternatively, i have managed to create a dataframe and append it to the delta table, instead of using the spark.sql(). This is much simpler.

Anonymous · ‎06-14-2023

Hi @Naga Vaibhav Elluru

Elevate our community by acknowledging exceptional contributions. Your participation in marking the best answers is a testament to our collective pursuit of knowledge

Databricks Community

Set timestamp column to blank when inserting a record into delta table

Connect with Databricks Users in Your Area

Databricks Learning Festival (Virtual): 15 January - 31 January 2025

Milestone: DatabricksTV Reaches 100 Videos!

Announcing the new Meta Llama 3.3 model on Databricks

Databricks Community Champion - December 2024 - Sujesh Menon

Dotmatics and Databricks Partner to Advance Scientific Intelligence in Life Sciences