02-04-2022 12:07 PM
Hello,
I'm trying to write to a Delta Table in my Databricks instance from a remote Spark session on a different cluster with the Simba Spark driver. I can do reads, but when I attempt to do a write, I get the following error:
{
df.write.format("jdbc").mode(SaveMode.Append).options(Map(
"url" -> "jdbc:spark://adb-<host_id>.azuredatabricks.net:443/default;transportMode=http;ssl=1;httpPath=<http_path>;AuthMech=3;UID=token;PWD=<token>",
"dbtable" -> "testtable",
"driver" -> "com.simba.spark.jdbc.Driver"
)).save()
}
java.sql.SQLFeatureNotSupportedException: [Simba][JDBC](10220) Driver does not support this optional feature.
at com.simba.spark.exceptions.ExceptionConverter.toSQLException(Unknown Source)
at com.simba.spark.jdbc.common.SPreparedStatement.checkTypeSupported(Unknown Source)
at com.simba.spark.jdbc.common.SPreparedStatement.setNull(Unknown Source)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:677)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$saveTable$1(JdbcUtils.scala:856)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$saveTable$1$adapted(JdbcUtils.scala:854)
at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2(RDD.scala:1020)
......
I'm currently using my "Data Science & Engineering" section's cluster to do the connection, where in the Advanced section there are the details to connect via JDBC/ODBC. Some guides indicate to use a SQL Endpoint for this, and that might be my problem, but I do not have permissions to create one at this time. Some posts around, like on StackOverflow, indicate it's an issue with the autocommit feature and that is not supported by the Simba Spark driver, but I'm unsure and I couldn't find a Spark or driver option that indicated to turn that off.
Also, all the documentation for doing spark.writes seem to be for when operating in a notebook instance on the Databricks server and no remote connection examples using the driver. Am I missing where a documentation page for that would be?
Remote Spark Instance
-------------------------------
Spark Version: 3.1.1
Scala Version: 2.1210
Spark Simba JDBC Driver from Databricks: 2.6.22
Databricks Cluster Settings
---------------------
Cloud System: Azure
Policy: Unrestricted
Cluster Mode: Standard
Autoscaling: Enabled
Databricks Runtime Version: 9.1 LTS (includes Apache 3.1.2, Scala 2.12)
Worker & Driver Type: Standard_DS3_v2
Please let me know if you need any other information to help me address my issue.
Thank you,
Kai
02-10-2022 07:42 AM
Hi Kaniz,
Any update on a resolution? I am also experiencing the same issue with not being able to write to my Databricks table but I am able to read using a JDBC driver.
01-09-2023 11:37 PM
@Kaniz Fatma The resolution is to add the JDBC driver's class name to the list of configured drivers which have autoCommit turned off --- How can we manage this list on a Windows machine? can you summarize the steps. It would be really appreciated
02-16-2022 01:27 PM
It's not clear to me how to set autocommit off/false. Is this a setting I'm supposed to make in my Spark configurations when I launch it in either -shell/-submit? Or is there some standard JDBC config file somewhere that gets referenced for these kinds of settings being turned off? I ask because the only documentation I see related to turning autocommit off is directly with the java.sql.Connection class, but I am trying to write using df.write.format("jdbc").options(Map( ... )).save() and it throws an error if you attempt to set an autocommit option. I also do not see an autocommit-related option in the Simba JBDC documentation where it has the list of available driver options that can be set via the options(Map( ... )) call. I can't use a Connection object as that only passes direct SQL commands and won't write out a DataFrame; I could loop it and inject the inserts, but that will be too clunky and slow.
02-17-2022 10:17 AM
@Kai McNeely Could you provide me more details on your client ? Such as which version of spark you run your code and do you provide extra spark conf other than the default one ? I tested use the spark 3.x with default spark conf and it works for me.
02-17-2022 11:56 AM
I am using Spark 3.1.2 (listed 3.1.1 in initial post, deployment originally listed 3.1.1) initializing with: spark-shell spark-shell -target:jvm-1.8
I am bringing in the following packages (some used and unused at the moment):
--conf "spark.driver.extraClassPath=~/jars/SparkJDBC42.jar" \
--conf "spark.jars.repositories=path/to/repos" \
--packages "com.microsoft.sqlserver:mssql-jdbc:6.4.0.jre8,net.sourceforge.jtds:jtds:1.3.1,com.oracle.database.jdbc:ojdbc8:21.1.0.0,com.databricks:spark-avro_2.10:4.0.0"
Following are the de-identified setting that come implicitly from my /opt/spark/conf/spark-default.conf:
spark.master=k8s://https://kubernetes.default.svc.cluster.local:443
spark.app.name=sp-worker
spark.eventLog.enabled true
spark.eventLog.dir file:///spark/logs
spark.kubernetes.driver.limit.cores=2
spark.kubernetes.executor.limit.cores=4
spark.executor.instances=2
spark.executor.memory=4G
spark.kubernetes.executor.limit.cores=4
spark.kubernetes.executor.request.cores=250m
spark.kubernetes.driver.label.app=spark-jlab
spark.kubernetes.executor.label.app=spark-jlab
spark.kubernetes.executor.label.deployment=jlab
spark.kubernetes.local.dirs.tmpfs=true
spark.kubernetes.container.image.pullPolicy=IfNotPresent
spark.kubernetes.container.image=docker.someurl.com/path/to/image/repo
# Home for Python libraries
spark.kubernetes.driver.volumes.persistentVolumeClaim.home.mount.path=/home
spark.kubernetes.driver.volumes.persistentVolumeClaim.home.mount.readOnly=false
spark.kubernetes.driver.volumes.persistentVolumeClaim.home.options.claimName=jlab
spark.kubernetes.executor.volumes.persistentVolumeClaim.home.mount.path=/home
spark.kubernetes.executor.volumes.persistentVolumeClaim.home.mount.readOnly=false
spark.kubernetes.executor.volumes.persistentVolumeClaim.home.options.claimName=jlab
# #Logs
spark.kubernetes.driver.volumes.persistentVolumeClaim.logs.mount.path=/logs
spark.kubernetes.driver.volumes.persistentVolumeClaim.logs.mount.readOnly=false
spark.kubernetes.driver.volumes.persistentVolumeClaim.logs.options.claimName=k8s-sparkhistorylogs
spark.kubernetes.executor.volumes.persistentVolumeClaim.logs.mount.path=/logs
spark.kubernetes.executor.volumes.persistentVolumeClaim.logs.mount.readOnly=false
spark.kubernetes.executor.volumes.persistentVolumeClaim.logs.options.claimName=k8s-sparkhistorylogs
spark.kubernetes.namespace=namespace-00000
spark.kubernetes.authenticate.driver.serviceAccountName=svcactnm-00000
spark.kubernetes.authenticate.oauthToken=token
spark.kubernetes.authenticate.caCertFile=/opt/osficerts/ctc.crt
spark.ui.xContentTypeOptions.enabled=true
spark.kubernetes.pyspark.pythonVersion=3
01-09-2023 09:35 PM
Hi @Kai McNeely , were you able to resolve this? I am also facing the same issue using exact same spark and JDBC driver versions.
03-15-2022 10:24 PM
Could you try setting the flag to ignore transactions? I’m not sure what the exact flag is, but there should be more details in the JDBC manual on how to do this
01-09-2023 11:42 PM
I tried by setting IgnoreTransactions flag value to 1, but it still didn't resolve the issue
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group