cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Databricks 13.3LTS to 15.4 LTS Migration - Spark job with source DB2 database not working

sahil_s_jain
New Contributor III

I'm trying to migrate a spark job from Databricks 13.3 LTS to 15.4 LTS. The spark job uses db2jcc4.jar for DB2 database connection. 

Below is the spark code:

%scala
// Import Spark SQL
import org.apache.spark.sql.SparkSession

// Create a Spark session
val spark = SparkSession.builder
  .appName("Read DB2 Data")
  .getOrCreate()


// Tried below properties but didn't work.
// spark.conf.set("spark.sql.caseSensitive", "true")
// spark.conf.set("spark.sql.legacy.timeParserPolicy", "LEGACY")
// spark.conf.set("spark.sql.parquet.inferTimestampNTZ.enabled", "false")
// spark.conf.set("spark.sql.legacy.parquet.datetimeRebaseModeInRead", "LEGACY")

// JDBC connection properties
val sqlQuery = """(SELECT * FROM UITC5Schema.AccountDetail_UITest5 order by ID ASC) AS finaldata"""
// Connection URL
val jdbcUrl = s"jdbc:db2://<IP>:50000/db0911"

// Define connection properties
val connectionProperties = new java.util.Properties()
connectionProperties.put("user", "userName1")
connectionProperties.put("password", "pwd123")
connectionProperties.put("driver", "com.ibm.db2.jcc.DB2Driver")
// Tried below property but didn't work.
//connectionProperties.put("nullCatalogMeansCurrent", "false")

// Read data from MSSQL into a DataFrame
val db2DF = spark.read
  .jdbc(jdbcUrl, sqlQuery, connectionProperties)

// Show the data
db2DF.show()

// Perform any transformations or actions on the DataFrame
db2DF.printSchema()

Database schema:

CREATE TABLE "DB2INST1"."CARS"  (
		  "ID" INTEGER , 
		  "CAR" VARCHAR(16 OCTETS) , 
		  "DATE" TIMESTAMP )   
		 IN "USERSPACE1"  
		 ORGANIZE BY ROW;

 For camel case columns, getting below error:

SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-206, SQLSTATE=42703, SQLERRMC=columnName1, DRIVER=4.9.78
	at com.ibm.db2.jcc.am.fd.a(fd.java:676)

When providing columnName1 in double quotes, it is working as expected.

For timestamp columns, getting below error irrespective of column name/casing:

[jcc][t4][10144][10862][4.9.78] Invalid parameter calendar: Parameter cannot be null. ERRORCODE=-4461, SQLSTATE=42815 Caused by: SqlSyntaxErrorException: [jcc][t4][10144][10862][4.9.78] Invalid parameter calendar: Parameter cannot be null. ERRORCODE=-4461, SQLSTATE=42815
	at com.ibm.db2.jcc.am.fd.a(fd.java:676) 	at com.ibm.db2.jcc.am.fd.a(fd.java:60) 	at com.ibm.db2.jcc.am.fd.a(fd.java:103) 	at com.ibm.db2.jcc.am.tm.getTimestamp(tm.java:1043) 	at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$15(JdbcUtils.scala:512) 	at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$15$adapted(JdbcUtils.scala:511) 	at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:376) 	at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:357) 	at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) 	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) 	at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31) 	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(generated.java:26) 	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) 	at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:50) 	at org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.$anonfun$encodeUnsafeRows$5(UnsafeRowBatchUtils.scala:88) 	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) 	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) 	at org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.$anonfun$encodeUnsafeRows$3(UnsafeRowBatchUtils.scala:88) 	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) 	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)

For Databricks compute 13.3 LTS, the provided code is working as expected irrespective of the column name and column data type.

1 ACCEPTED SOLUTION

Accepted Solutions

Ismael-K
Databricks Employee
Databricks Employee

@sahil_s_jain Please try setting this spark property on the 15.4 cluster:

 

sparkConf.set("spark.sql.legacy.jdbc.useNullCalendar", "false")

 

After the property is set, Spark will not use a Null calendar instance for JDBC operations.

You can also add the below property to set the spark session timezone:

 

spark.conf.set("spark.sql.session.timeZone", "America/New_York")

 

 

 

View solution in original post

2 REPLIES 2

Ismael-K
Databricks Employee
Databricks Employee

@sahil_s_jain Please try setting this spark property on the 15.4 cluster:

 

sparkConf.set("spark.sql.legacy.jdbc.useNullCalendar", "false")

 

After the property is set, Spark will not use a Null calendar instance for JDBC operations.

You can also add the below property to set the spark session timezone:

 

spark.conf.set("spark.sql.session.timeZone", "America/New_York")

 

 

 

sahil_s_jain
New Contributor III

Thanks for the solution.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group