cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Datatype mismatch while reading data from sql server to databricks

Teja07
New Contributor II

Data from Azure sql server was read into databricks through JDBC connection (spark version 2.x) and stored into Gen1. Now the client wants to migrate the data from Gen1 to Gen2. When we ran the same jobs that read data from Azure Sql Server to Databricks through JDBC( spark version upgraded from 2.x to 3.2) source side DATE type columns are populating as STRING. Except the spark version updation there is no techincal or functional change or no schema change in the source. Unable to find the root cause. Can someone help me find the exact issue?

1 ACCEPTED SOLUTION

Accepted Solutions

-werners-
Esteemed Contributor III

f.e. there is a spark option to enable the 'old' date handling.

You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0.

Frankly I am not a fan of that approach as Spark 3 gives you a lot of interesting date functions.

So what you could do is to first identify where you have date columns, and explicitly cast them to dates with the to_date function.

View solution in original post

4 REPLIES 4

-werners-
Esteemed Contributor III

Spark 2.x and Spark 3.x handle dates differently.

Running spark 2.x scripts on spark 3.x will very likely have issues.

Please check the spark 3 migration guide:

https://spark.apache.org/docs/3.0.2/sql-migration-guide.html#upgrading-from-spark-sql-24-to-30

Teja07
New Contributor II

@Werner Stinckens​ , Above link was extensive and very helpful, however i didn't get the exact details from it. Could you be more specific.

Anonymous
Not applicable

Hi @Mani Teja G​ 

Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.

Please help us select the best solution by clicking on "Select As Best" if it does.

Your feedback will help us ensure that we are providing the best possible service to you. Thank you!

-werners-
Esteemed Contributor III

f.e. there is a spark option to enable the 'old' date handling.

You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0.

Frankly I am not a fan of that approach as Spark 3 gives you a lot of interesting date functions.

So what you could do is to first identify where you have date columns, and explicitly cast them to dates with the to_date function.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.