I'm trying to migrate a spark job from Databricks 13.3 LTS to 15.4 LTS. The spark job uses db2jcc4.jar for DB2 database connection.
Below is the spark code:
%scala
// Import Spark SQL
import org.apache.spark.sql.SparkSession
// Create a Spark session
val spark = SparkSession.builder
.appName("Read DB2 Data")
.getOrCreate()
// Tried below properties but didn't work.
// spark.conf.set("spark.sql.caseSensitive", "true")
// spark.conf.set("spark.sql.legacy.timeParserPolicy", "LEGACY")
// spark.conf.set("spark.sql.parquet.inferTimestampNTZ.enabled", "false")
// spark.conf.set("spark.sql.legacy.parquet.datetimeRebaseModeInRead", "LEGACY")
// JDBC connection properties
val sqlQuery = """(SELECT * FROM UITC5Schema.AccountDetail_UITest5 order by ID ASC) AS finaldata"""
// Connection URL
val jdbcUrl = s"jdbc:db2://<IP>:50000/db0911"
// Define connection properties
val connectionProperties = new java.util.Properties()
connectionProperties.put("user", "userName1")
connectionProperties.put("password", "pwd123")
connectionProperties.put("driver", "com.ibm.db2.jcc.DB2Driver")
// Tried below property but didn't work.
//connectionProperties.put("nullCatalogMeansCurrent", "false")
// Read data from MSSQL into a DataFrame
val db2DF = spark.read
.jdbc(jdbcUrl, sqlQuery, connectionProperties)
// Show the data
db2DF.show()
// Perform any transformations or actions on the DataFrame
db2DF.printSchema()
Database schema:
CREATE TABLE "DB2INST1"."CARS" (
"ID" INTEGER ,
"CAR" VARCHAR(16 OCTETS) ,
"DATE" TIMESTAMP )
IN "USERSPACE1"
ORGANIZE BY ROW;
For camel case columns, getting below error:
SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-206, SQLSTATE=42703, SQLERRMC=columnName1, DRIVER=4.9.78
at com.ibm.db2.jcc.am.fd.a(fd.java:676)
When providing columnName1 in double quotes, it is working as expected.
For timestamp columns, getting below error irrespective of column name/casing:
[jcc][t4][10144][10862][4.9.78] Invalid parameter calendar: Parameter cannot be null. ERRORCODE=-4461, SQLSTATE=42815 Caused by: SqlSyntaxErrorException: [jcc][t4][10144][10862][4.9.78] Invalid parameter calendar: Parameter cannot be null. ERRORCODE=-4461, SQLSTATE=42815
at com.ibm.db2.jcc.am.fd.a(fd.java:676) at com.ibm.db2.jcc.am.fd.a(fd.java:60) at com.ibm.db2.jcc.am.fd.a(fd.java:103) at com.ibm.db2.jcc.am.tm.getTimestamp(tm.java:1043) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$15(JdbcUtils.scala:512) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$15$adapted(JdbcUtils.scala:511) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:376) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:357) at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(generated.java:26) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:50) at org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.$anonfun$encodeUnsafeRows$5(UnsafeRowBatchUtils.scala:88) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.$anonfun$encodeUnsafeRows$3(UnsafeRowBatchUtils.scala:88) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
For Databricks compute 13.3 LTS, the provided code is working as expected irrespective of the column name and column data type.