Databricks Community

Optum · ‎02-04-2022

Hello,

I'm trying to write to a Delta Table in my Databricks instance from a remote Spark session on a different cluster with the Simba Spark driver. I can do reads, but when I attempt to do a write, I get the following error:

{

df.write.format("jdbc").mode(SaveMode.Append).options(Map(

"url" -> "jdbc:spark://adb-<host_id>.azuredatabricks.net:443/default;transportMode=http;ssl=1;httpPath=<http_path>;AuthMech=3;UID=token;PWD=<token>",

"dbtable" -> "testtable",

"driver" -> "com.simba.spark.jdbc.Driver"

)).save()

}

java.sql.SQLFeatureNotSupportedException: [Simba][JDBC](10220) Driver does not support this optional feature.

at com.simba.spark.exceptions.ExceptionConverter.toSQLException(Unknown Source)

at com.simba.spark.jdbc.common.SPreparedStatement.checkTypeSupported(Unknown Source)

at com.simba.spark.jdbc.common.SPreparedStatement.setNull(Unknown Source)

at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:677)

at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$saveTable$1(JdbcUtils.scala:856)

at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$saveTable$1$adapted(JdbcUtils.scala:854)

at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2(RDD.scala:1020)

......

I'm currently using my "Data Science & Engineering" section's cluster to do the connection, where in the Advanced section there are the details to connect via JDBC/ODBC. Some guides indicate to use a SQL Endpoint for this, and that might be my problem, but I do not have permissions to create one at this time. Some posts around, like on StackOverflow, indicate it's an issue with the autocommit feature and that is not supported by the Simba Spark driver, but I'm unsure and I couldn't find a Spark or driver option that indicated to turn that off.

Also, all the documentation for doing spark.writes seem to be for when operating in a notebook instance on the Databricks server and no remote connection examples using the driver. Am I missing where a documentation page for that would be?

Remote Spark Instance

-------------------------------

Spark Version: 3.1.1

Scala Version: 2.1210

Spark Simba JDBC Driver from Databricks: 2.6.22

Databricks Cluster Settings

---------------------

Cloud System: Azure

Policy: Unrestricted

Cluster Mode: Standard

Autoscaling: Enabled

Databricks Runtime Version: 9.1 LTS (includes Apache 3.1.2, Scala 2.12)

Worker & Driver Type: Standard_DS3_v2

Please let me know if you need any other information to help me address my issue.

Thank you,

Kai

dblevins · ‎02-10-2022

Hi Kaniz,

Any update on a resolution? I am also experiencing the same issue with not being able to write to my Databricks table but I am able to read using a JDBC driver.

pulkitm · ‎01-09-2023

@Kaniz Fatma The resolution is to add the JDBC driver's class name to the list of configured drivers which have autoCommit turned off --- How can we manage this list on a Windows machine? can you summarize the steps. It would be really appreciated

Optum · ‎02-16-2022

It's not clear to me how to set autocommit off/false. Is this a setting I'm supposed to make in my Spark configurations when I launch it in either -shell/-submit? Or is there some standard JDBC config file somewhere that gets referenced for these kinds of settings being turned off? I ask because the only documentation I see related to turning autocommit off is directly with the java.sql.Connection class, but I am trying to write using df.write.format("jdbc").options(Map( ... )).save() and it throws an error if you attempt to set an autocommit option. I also do not see an autocommit-related option in the Simba JBDC documentation where it has the list of available driver options that can be set via the options(Map( ... )) call. I can't use a Connection object as that only passes direct SQL commands and won't write out a DataFrame; I could loop it and inject the inserts, but that will be too clunky and slow.

User16752239289 · ‎02-17-2022

@Kai McNeely Could you provide me more details on your client ? Such as which version of spark you run your code and do you provide extra spark conf other than the default one ? I tested use the spark 3.x with default spark conf and it works for me.

Optum · ‎02-17-2022

I am using Spark 3.1.2 (listed 3.1.1 in initial post, deployment originally listed 3.1.1) initializing with: spark-shell spark-shell -target:jvm-1.8

I am bringing in the following packages (some used and unused at the moment):

--conf "spark.driver.extraClassPath=~/jars/SparkJDBC42.jar" \

--conf "spark.jars.repositories=path/to/repos" \

--packages "com.microsoft.sqlserver:mssql-jdbc:6.4.0.jre8,net.sourceforge.jtds:jtds:1.3.1,com.oracle.database.jdbc:ojdbc8:21.1.0.0,com.databricks:spark-avro_2.10:4.0.0"

Following are the de-identified setting that come implicitly from my /opt/spark/conf/spark-default.conf:

spark.master=k8s://https://kubernetes.default.svc.cluster.local:443

spark.app.name=sp-worker

spark.eventLog.enabled true

spark.eventLog.dir file:///spark/logs

spark.kubernetes.driver.limit.cores=2

spark.kubernetes.executor.limit.cores=4

spark.executor.instances=2

spark.executor.memory=4G

spark.kubernetes.executor.limit.cores=4

spark.kubernetes.executor.request.cores=250m

spark.kubernetes.driver.label.app=spark-jlab

spark.kubernetes.executor.label.app=spark-jlab

spark.kubernetes.executor.label.deployment=jlab

spark.kubernetes.local.dirs.tmpfs=true

spark.kubernetes.container.image.pullPolicy=IfNotPresent

spark.kubernetes.container.image=docker.someurl.com/path/to/image/repo

# Home for Python libraries

spark.kubernetes.driver.volumes.persistentVolumeClaim.home.mount.path=/home

spark.kubernetes.driver.volumes.persistentVolumeClaim.home.mount.readOnly=false

spark.kubernetes.driver.volumes.persistentVolumeClaim.home.options.claimName=jlab

spark.kubernetes.executor.volumes.persistentVolumeClaim.home.mount.path=/home

spark.kubernetes.executor.volumes.persistentVolumeClaim.home.mount.readOnly=false

spark.kubernetes.executor.volumes.persistentVolumeClaim.home.options.claimName=jlab

# #Logs

spark.kubernetes.driver.volumes.persistentVolumeClaim.logs.mount.path=/logs

spark.kubernetes.driver.volumes.persistentVolumeClaim.logs.mount.readOnly=false

spark.kubernetes.driver.volumes.persistentVolumeClaim.logs.options.claimName=k8s-sparkhistorylogs

spark.kubernetes.executor.volumes.persistentVolumeClaim.logs.mount.path=/logs

spark.kubernetes.executor.volumes.persistentVolumeClaim.logs.mount.readOnly=false

spark.kubernetes.executor.volumes.persistentVolumeClaim.logs.options.claimName=k8s-sparkhistorylogs

spark.kubernetes.namespace=namespace-00000

spark.kubernetes.authenticate.driver.serviceAccountName=svcactnm-00000

spark.kubernetes.authenticate.oauthToken=token

spark.kubernetes.authenticate.caCertFile=/opt/osficerts/ctc.crt

spark.ui.xContentTypeOptions.enabled=true

spark.kubernetes.pyspark.pythonVersion=3

pulkitm · ‎01-09-2023

Hi @Kai McNeely , were you able to resolve this? I am also facing the same issue using exact same spark and JDBC driver versions.

Atanu · ‎03-15-2022

Could you try setting the flag to ignore transactions? I’m not sure what the exact flag is, but there should be more details in the JDBC manual on how to do this