01-25-2022 03:04 AM
Greetings,
In the documentation for Databricks SQL it states that it supports JDBC connections, however when connecting to AWS Redshift via the built in PostgreSQL driver ("CREATE TABLE sample USING JDBC" and "jdbc://postgresql:/..." URI) I'm getting a weird error as follows:
"
org.apache.spark.sql.AnalysisException:
The user-specified schema doesn't match the actual schema:
user-specified: `id` INT, (...other columns edited out), actual: `id` INT, (...other columns edited out). If you're using
DataFrameReader.schema API or creating a table, please do not specify the schema.
Or if you're scanning an existed table, please drop it and re-create it.
"
The list of columns were edited out but it is exactly the same list in "user-specified" and in "actual", seems like rather some internal detail is different rather than the printed DDL.
Does this mean that currently there is no way to query AWS Redshift as external data store from Databricks SQL?
Thanks,
Cristian
01-25-2022 11:01 PM
@Cristian Constantinescu we see that issue is occurring on the 10.1 version of dbr with 3.2 version of spark and recently we have fixed this issue and the fix will be rolled out soon on all the workspaces.
you can try to use the 10.1 dbr with notebooks and check if the issue is reproducible and for the time you can use the lower versions of dbr with notebooks.
01-25-2022 03:47 AM
@Cristian Constantinescu Thanks for posting your query/issue here.
can you please try to create the table using the jdbc connections in the databricks notebooks using the 9.1 version of DBR and check if you are able to perform it?
01-25-2022 03:59 AM
From notebooks the spark sql queries are working as expected.
The problem seems to appear when using the "Databricks SQL" product, for instance when asking for a data sample or when running queries on the external table (that does the JDBC to Redshift via PostgreSQL JDBC URI) from the metastore.
The strangest of the things is that this code seems to be throwing the error:
spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
if (baseRelation.schema != schema) {
362 throw new AnalysisException(
363 "The user-specified schema doesn't match the actual schema: " +
364 s"user-specified: ${schema.toDDL}, actual: ${baseRelation.schema.toDDL}. If " +
365 "you're using DataFrameReader.schema API or creating a table, please do not " +
366 "specify the schema. Or if you're scanning an existed table, please drop " +
367 "it and re-create it.")
368 }
Meaning baseRelation.schema != schema but schema.toDDL and baseRelation.schema.toDDL is the same string in my case.
Hope this better describes the problem.
Thanks for your interest in solving this issue!
01-25-2022 11:01 PM
@Cristian Constantinescu we see that issue is occurring on the 10.1 version of dbr with 3.2 version of spark and recently we have fixed this issue and the fix will be rolled out soon on all the workspaces.
you can try to use the 10.1 dbr with notebooks and check if the issue is reproducible and for the time you can use the lower versions of dbr with notebooks.
01-26-2022 12:39 AM
@Pavan Kumar Chalamcharla thank you for your help in understanding better this situation.
01-28-2022 05:53 AM
@Cristian Constantinescu can you point me to the documentation page which says DBSQL supports JDBC connections?
01-28-2022 06:04 AM
@Bilal Aslam there is a high possibility that my understand is not correct, but when I was reading the Databricks SQL guide I stumbled upon this page: https://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-create-table-using.html
In the page it explains how the CREATE TABLE statement works for DBSQL, there is the following section:
"USING data_source The file format to use for the table. data_source must be one of TEXT, AVRO, CSV, JSON, JDBC, PARQUET, ORC, or DELTA. If USING is omitted, the default is DELTA."
In the meantime we discussed with our account rep and we were briefed that JDBC is not supported for DBSQL which is compatible with our experience.
I hope that this information was helpful in any way!
01-28-2022 06:05 AM
@Cristian Constantinescu you're right, this is a documentation bug since DBSQL does not [yet] support querying JDBC. I'll work with the team to fix it. Thanks for sharing the doc link!
01-28-2022 06:10 AM
@Bilal Aslam anytime! Is there a place where customers could follow the timeline when such features are introduced?
01-28-2022 06:26 AM
@Cristian Constantinescu I believe that's at ideas.databricks.com
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group