cancel
Showing results forĀ 
Search instead forĀ 
Did you mean:Ā 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forĀ 
Search instead forĀ 
Did you mean:Ā 

Datatype mismatch while reading data from sql server to databricks

Teja07
New Contributor II

Data from Azure sql server was read into databricks through JDBC connection (spark version 2.x) and stored into Gen1. Now the client wants to migrate the data from Gen1 to Gen2. When we ran the same jobs that read data from Azure Sql Server to Databricks through JDBC( spark version upgraded from 2.x to 3.2) source side DATE type columns are populating as STRING. Except the spark version updation there is no techincal or functional change or no schema change in the source. Unable to find the root cause. Can someone help me find the exact issue?

1 ACCEPTED SOLUTION

Accepted Solutions

-werners-
Esteemed Contributor III

f.e. there is a spark option to enable the 'old' date handling.

You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0.

Frankly I am not a fan of that approach as Spark 3 gives you a lot of interesting date functions.

So what you could do is to first identify where you have date columns, and explicitly cast them to dates with the to_date function.

View solution in original post

4 REPLIES 4

-werners-
Esteemed Contributor III

Spark 2.x and Spark 3.x handle dates differently.

Running spark 2.x scripts on spark 3.x will very likely have issues.

Please check the spark 3 migration guide:

https://spark.apache.org/docs/3.0.2/sql-migration-guide.html#upgrading-from-spark-sql-24-to-30

Teja07
New Contributor II

@Werner Stinckensā€‹ , Above link was extensive and very helpful, however i didn't get the exact details from it. Could you be more specific.

Anonymous
Not applicable

Hi @Mani Teja Gā€‹ 

Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.

Please help us select the best solution by clicking on "Select As Best" if it does.

Your feedback will help us ensure that we are providing the best possible service to you. Thank you!

-werners-
Esteemed Contributor III

f.e. there is a spark option to enable the 'old' date handling.

You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0.

Frankly I am not a fan of that approach as Spark 3 gives you a lot of interesting date functions.

So what you could do is to first identify where you have date columns, and explicitly cast them to dates with the to_date function.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonā€™t want to miss the chance to attend and share knowledge.

If there isnā€™t a group near you, start one and help create a community that brings people together.

Request a New Group