06-14-2023 03:17 AM
I am trying to insert a record into Delta table using notebook written in python. This record has a timestamp column that should be blank initially, later i have a plan to update the timestamp value.
How am i inserting the record:
stmt_insert_audit_record = 'insert into default.batch_run (task_name, start, end, status) values (\''+param_task_name+'\', \''+param_start+'\', \''+param_end+'\', \''+param_status+'\')'
spark.sql(stmt_insert_audit_record)
Out of these columns, when i setup param_end as below, the insert statement works fine.
param_end = datetime.utcnow().strftime("%Y-%m-%dT%H:%M:%S")
However I do not want to set an end date. Removing the column name from insert statement will give me an error, since spark will expect all the columns mentioned. I would get an exception as `Column end is not specified in INSERT`.
How do i set the param_end value so that the Insert statement can consider a blank value?
06-14-2023 09:40 AM
When i tried enclosing the param_end with double quote, i get following error:
```org.apache.spark.SparkDateTimeException: [CAST_INVALID_INPUT] The value '' of the type "STRING" cannot be cast to "TIMESTAMP" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.
Then I tried with cast function on the param_end to convert it to TIMESTAMP. This works now.
Alternatively, i have managed to create a dataframe and append it to the delta table, instead of using the spark.sql(). This is much simpler.
06-14-2023 03:53 AM
First I would recommend using a multiline f-string in spark.sql like this...
spark.sql(f'''
insert into default.batch_run
(
task_name,
start,
end,
status
) values (
{param_task_name},
{param_start},
NULL,
{param_status}
''')
There are other options I havent tried if NULL doesnt work like
None
or
lit(None)
Let me know which works for you 🙂
06-14-2023 05:12 AM
i will certainly try out the f-string. thank you @Tyler Retzlaff
Options i tried:
Are there any other ways to specify blank?
06-14-2023 09:57 AM
You want to try lit(None) not list(None), did you try that?
I see you said below that youve managed to append the dataframe ehich was easier. I manaipulate dataframes as much as possible with pyspark until i need to do a join/upsert with an existing table. The pyspark method is easier to use python variables and there will be instances where you need to iterate through column names. https://www.sparkbyexamples.com/pyspark has been immensly helpful.
Let me know how it works out 🙂
06-15-2023 04:37 AM
I actually meant to write as lit(None). Thanks for that page. I basically needed to cast the TIMESTAMP column when formulating the spark.sql input.
Like:
end_time = ""
stmt = "INSERT INTO default.another(msg, end_time) values('"+msg+ "', cast('"+ end_time+ "' as TIMESTAMP))"
06-15-2023 04:39 AM
Good to know, glad you figured it out
06-14-2023 06:28 AM
Enclose the SQL string in double quotes, assign an empty string to param_end, and on the SQL string, enclose the param_end in single quotes,
Eg:
param_end = ""
stmt_insert_audit_record = "INSERT INTO default.batch_run (task_name, start, end, status) values ("+param_task_name+", " + param_start+ ", '"+ param_end+ "', " +param_status+ ")"
spark.sql(stmt_insert_audit_record)
This could help I believe.
06-14-2023 09:40 AM
When i tried enclosing the param_end with double quote, i get following error:
```org.apache.spark.SparkDateTimeException: [CAST_INVALID_INPUT] The value '' of the type "STRING" cannot be cast to "TIMESTAMP" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.
Then I tried with cast function on the param_end to convert it to TIMESTAMP. This works now.
Alternatively, i have managed to create a dataframe and append it to the delta table, instead of using the spark.sql(). This is much simpler.
06-14-2023 10:21 PM
Hi @Naga Vaibhav Elluru
Elevate our community by acknowledging exceptional contributions. Your participation in marking the best answers is a testament to our collective pursuit of knowledge
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group