Databricks

cristianc · ‎01-25-2022

Greetings,

In the documentation for Databricks SQL it states that it supports JDBC connections, however when connecting to AWS Redshift via the built in PostgreSQL driver ("CREATE TABLE sample USING JDBC" and "jdbc://postgresql:/..." URI) I'm getting a weird error as follows:

"

org.apache.spark.sql.AnalysisException:

The user-specified schema doesn't match the actual schema:

user-specified: `id` INT, (...other columns edited out), actual: `id` INT, (...other columns edited out). If you're using

DataFrameReader.schema API or creating a table, please do not specify the schema.

Or if you're scanning an existed table, please drop it and re-create it.

"

The list of columns were edited out but it is exactly the same list in "user-specified" and in "actual", seems like rather some internal detail is different rather than the printed DDL.

Does this mean that currently there is no way to query AWS Redshift as external data store from Databricks SQL?

Thanks,

Cristian

pavan_kumar · ‎01-25-2022

@Cristian Constantinescu we see that issue is occurring on the 10.1 version of dbr with 3.2 version of spark and recently we have fixed this issue and the fix will be rolled out soon on all the workspaces.

you can try to use the 10.1 dbr with notebooks and check if the issue is reproducible and for the time you can use the lower versions of dbr with notebooks.

View solution in original post

pavan_kumar · ‎01-25-2022

@Cristian Constantinescu Thanks for posting your query/issue here.

can you please try to create the table using the jdbc connections in the databricks notebooks using the 9.1 version of DBR and check if you are able to perform it?

cristianc · ‎01-25-2022

From notebooks the spark sql queries are working as expected.

The problem seems to appear when using the "Databricks SQL" product, for instance when asking for a data sample or when running queries on the external table (that does the JDBC to Redshift via PostgreSQL JDBC URI) from the metastore.

The strangest of the things is that this code seems to be throwing the error:

spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala

if (baseRelation.schema != schema) {

362 throw new AnalysisException(

363 "The user-specified schema doesn't match the actual schema: " +

364 s"user-specified: ${schema.toDDL}, actual: ${baseRelation.schema.toDDL}. If " +

365 "you're using DataFrameReader.schema API or creating a table, please do not " +

366 "specify the schema. Or if you're scanning an existed table, please drop " +

367 "it and re-create it.")

368 }

Meaning baseRelation.schema != schema but schema.toDDL and baseRelation.schema.toDDL is the same string in my case.

Hope this better describes the problem.

Thanks for your interest in solving this issue!

pavan_kumar · ‎01-25-2022

@Cristian Constantinescu we see that issue is occurring on the 10.1 version of dbr with 3.2 version of spark and recently we have fixed this issue and the fix will be rolled out soon on all the workspaces.

you can try to use the 10.1 dbr with notebooks and check if the issue is reproducible and for the time you can use the lower versions of dbr with notebooks.

cristianc · ‎01-26-2022

@Pavan Kumar Chalamcharla thank you for your help in understanding better this situation.

BilalAslamDbrx · ‎01-28-2022

@Cristian Constantinescu can you point me to the documentation page which says DBSQL supports JDBC connections?

cristianc · ‎01-28-2022

@Bilal Aslam there is a high possibility that my understand is not correct, but when I was reading the Databricks SQL guide I stumbled upon this page: https://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-create-table-using.html

In the page it explains how the CREATE TABLE statement works for DBSQL, there is the following section:

"USING data_source The file format to use for the table. data_source must be one of TEXT, AVRO, CSV, JSON, JDBC, PARQUET, ORC, or DELTA. If USING is omitted, the default is DELTA."

In the meantime we discussed with our account rep and we were briefed that JDBC is not supported for DBSQL which is compatible with our experience.

I hope that this information was helpful in any way!

BilalAslamDbrx · ‎01-28-2022

@Cristian Constantinescu you're right, this is a documentation bug since DBSQL does not [yet] support querying JDBC. I'll work with the team to fix it. Thanks for sharing the doc link!

cristianc · ‎01-28-2022

@Bilal Aslam anytime! Is there a place where customers could follow the timeline when such features are introduced?

BilalAslamDbrx · ‎01-28-2022

@Cristian Constantinescu I believe that's at ideas.databricks.com

Databricks

Query AWS Redshift from Databricks SQL

Registration now open! Databricks Data + AI Summit 2024

Meet DBRX, the New Standard for High-Quality LLMs

Data Warehousing in the Era of AI