- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-14-2023 03:17 AM
I am trying to insert a record into Delta table using notebook written in python. This record has a timestamp column that should be blank initially, later i have a plan to update the timestamp value.
How am i inserting the record:
stmt_insert_audit_record = 'insert into default.batch_run (task_name, start, end, status) values (\''+param_task_name+'\', \''+param_start+'\', \''+param_end+'\', \''+param_status+'\')'
spark.sql(stmt_insert_audit_record)
Out of these columns, when i setup param_end as below, the insert statement works fine.
param_end = datetime.utcnow().strftime("%Y-%m-%dT%H:%M:%S")
However I do not want to set an end date. Removing the column name from insert statement will give me an error, since spark will expect all the columns mentioned. I would get an exception as `Column end is not specified in INSERT`.
How do i set the param_end value so that the Insert statement can consider a blank value?
- Labels:
-
Delta
-
Delta table
-
Error Messages
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-14-2023 09:40 AM
When i tried enclosing the param_end with double quote, i get following error:
```org.apache.spark.SparkDateTimeException: [CAST_INVALID_INPUT] The value '' of the type "STRING" cannot be cast to "TIMESTAMP" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.
Then I tried with cast function on the param_end to convert it to TIMESTAMP. This works now.
Alternatively, i have managed to create a dataframe and append it to the delta table, instead of using the spark.sql(). This is much simpler.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-14-2023 03:53 AM
First I would recommend using a multiline f-string in spark.sql like this...
spark.sql(f'''
insert into default.batch_run
(
task_name,
start,
end,
status
) values (
{param_task_name},
{param_start},
NULL,
{param_status}
''')
There are other options I havent tried if NULL doesnt work like
None
or
lit(None)
Let me know which works for you 🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-14-2023 05:12 AM
i will certainly try out the f-string. thank you @Tyler Retzlaff
Options i tried:
- With "None", it will not be able to concatenate with rest of the string. I get this error "can only concatenate str (not "NoneType") to str"
- list(None) throws an error "Column is not iterable"
- NULL - This will not work on spark, it will recognize it as an identifer.
Are there any other ways to specify blank?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-14-2023 09:57 AM
You want to try lit(None) not list(None), did you try that?
I see you said below that youve managed to append the dataframe ehich was easier. I manaipulate dataframes as much as possible with pyspark until i need to do a join/upsert with an existing table. The pyspark method is easier to use python variables and there will be instances where you need to iterate through column names. https://www.sparkbyexamples.com/pyspark has been immensly helpful.
Let me know how it works out 🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-15-2023 04:37 AM
I actually meant to write as lit(None). Thanks for that page. I basically needed to cast the TIMESTAMP column when formulating the spark.sql input.
Like:
end_time = ""
stmt = "INSERT INTO default.another(msg, end_time) values('"+msg+ "', cast('"+ end_time+ "' as TIMESTAMP))"
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-15-2023 04:39 AM
Good to know, glad you figured it out
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-14-2023 06:28 AM
Enclose the SQL string in double quotes, assign an empty string to param_end, and on the SQL string, enclose the param_end in single quotes,
Eg:
param_end = ""
stmt_insert_audit_record = "INSERT INTO default.batch_run (task_name, start, end, status) values ("+param_task_name+", " + param_start+ ", '"+ param_end+ "', " +param_status+ ")"
spark.sql(stmt_insert_audit_record)
This could help I believe.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-14-2023 09:40 AM
When i tried enclosing the param_end with double quote, i get following error:
```org.apache.spark.SparkDateTimeException: [CAST_INVALID_INPUT] The value '' of the type "STRING" cannot be cast to "TIMESTAMP" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.
Then I tried with cast function on the param_end to convert it to TIMESTAMP. This works now.
Alternatively, i have managed to create a dataframe and append it to the delta table, instead of using the spark.sql(). This is much simpler.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-14-2023 10:21 PM
Hi @Naga Vaibhav Elluru
Elevate our community by acknowledging exceptional contributions. Your participation in marking the best answers is a testament to our collective pursuit of knowledge

