Databricks

Teja07 · ‎04-06-2023

Data from Azure sql server was read into databricks through JDBC connection (spark version 2.x) and stored into Gen1. Now the client wants to migrate the data from Gen1 to Gen2. When we ran the same jobs that read data from Azure Sql Server to Databricks through JDBC( spark version upgraded from 2.x to 3.2) source side DATE type columns are populating as STRING. Except the spark version updation there is no techincal or functional change or no schema change in the source. Unable to find the root cause. Can someone help me find the exact issue?

-werners- · ‎04-11-2023

f.e. there is a spark option to enable the 'old' date handling.

You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0.

Frankly I am not a fan of that approach as Spark 3 gives you a lot of interesting date functions.

So what you could do is to first identify where you have date columns, and explicitly cast them to dates with the to_date function.

View solution in original post

-werners- · ‎04-06-2023

Spark 2.x and Spark 3.x handle dates differently.

Running spark 2.x scripts on spark 3.x will very likely have issues.

Please check the spark 3 migration guide:

https://spark.apache.org/docs/3.0.2/sql-migration-guide.html#upgrading-from-spark-sql-24-to-30

Teja07 · ‎04-11-2023

@Werner Stinckens , Above link was extensive and very helpful, however i didn't get the exact details from it. Could you be more specific.

Anonymous · ‎04-07-2023

Hi @Mani Teja G

Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.

Please help us select the best solution by clicking on "Select As Best" if it does.

Your feedback will help us ensure that we are providing the best possible service to you. Thank you!

-werners- · ‎04-11-2023

f.e. there is a spark option to enable the 'old' date handling.

You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0.

Frankly I am not a fan of that approach as Spark 3 gives you a lot of interesting date functions.

So what you could do is to first identify where you have date columns, and explicitly cast them to dates with the to_date function.

Databricks

Datatype mismatch while reading data from sql server to databricks

Registration now open! Databricks Data + AI Summit 2024

Meet DBRX, the New Standard for High-Quality LLMs

Data Warehousing in the Era of AI