cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

to_date not functioning as expected after introduction of arbitrary replaceWhere in Databricks 9.1 LTS

kolangareth
New Contributor III

I am trying to do a dynamic partition overwrite on delta table using replaceWhere option. This was working fine until I upgraded the DB runtime to 9.1 LTS from 8.3.x. I am concatenating 'year', 'month' and 'day' columns and then using to_date function to define the replaceWhere condition. I could find in the documentation that there was a new feature added as part of 9.1 LTS - Delta now supports arbitrary replaceWhere. Probably this has introduced a bug here. I am using the spark conf setting to fallback to old behaviour.

spark.conf.set("spark.databricks.delta.replaceWhere.dataColumns.enabled", false)

But would be nice to have a better solution for this problem. Here is my write statement with replaceWhere.

dataframeToWrite.write.format("delta").mode("overwrite").partitionBy("year","month","day").option("replaceWhere", s"to_date(concat(year, '-', month, '-', day)) in ($datesList)").save(outputLocation)

Error message:

Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 11.0 failed 4 times, most recent failure: Lost task 0.3 in stage 11.0 (TID 13) (): java.lang.UnsupportedOperationException: Cannot generate code for expression: to_date(concat(cast(year#2810 as string), -, cast(month#2863 as string), -, cast(day#2916 as string)), None)

PS: I've also tried specifying the format in to_date, but it doesn't help either.

1 ACCEPTED SOLUTION

Accepted Solutions

-werners-
Esteemed Contributor III

strange,

have you tried to create a column with to_date in your dataframe, so not in the replaceWhere condition?

df2 = df1.withColumn(to_date(..., "yyyyMMdd")) something like that

View solution in original post

9 REPLIES 9

Anonymous
Not applicable

Hi @Prasanth Kolangareth​ - Welcome and thank you for your question. I'm sorry to hear about the trouble you're having. Let's give the community a chance to respond first. Thanks for your patience.

-werners-
Esteemed Contributor III

strange,

have you tried to create a column with to_date in your dataframe, so not in the replaceWhere condition?

df2 = df1.withColumn(to_date(..., "yyyyMMdd")) something like that

Hi @Werner Stinckens​ , the to_date function works fine everywhere else like filter, withColumn etc. If you meant to create a new column in the dataframe and then use it in the replaceWhere - it's not something I want to do, because to do that I must add this new column in the target delta table also.

-werners-
Esteemed Contributor III

my only guess is that either the replaceWhere cannot take expressions, or the expression is faulty.

I think it is neither.

  • replaceWhere can take expressions, because I've tried using concat without to_date and it doesn't fail.
  • The expression isn't faulty, because it works with the spark conf setting to fallback to old behaviour and also the same expression works in filter, withColumn.

Hubert-Dudek
Esteemed Contributor III

Can you try to generate that column earlier:

.withColumn("conditionColumn", to_date...

to simplify code so replaceWhere will be easier to debug as now it is complaining about to_date

Hi @Hubert Dudek​ , to do that I must also add this new column in the target delta table, which I don't want to do. Adding this new column only in the dataframe to write will not work because of two reasons,

  1. delta will give a schema mismacth error while writing.
  2. I think the replaceWhere condition is basically applied on the target delta table and not on the input dataframe. So the condition should be on some columns which is already existing in the table.

Hubert-Dudek
Esteemed Contributor III

yes but you can at least do it for testing even with small copy of dataframe, this way is easier to find the problem

Anonymous
Not applicable

@Prasanth Kolangareth​ - Does Hubert's answer resolve the problem for you? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.