cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Query AWS Redshift from Databricks SQL

cristianc
Contributor

Greetings,

In the documentation for Databricks SQL it states that it supports JDBC connections, however when connecting to AWS Redshift via the built in PostgreSQL driver ("CREATE TABLE sample USING JDBC" and "jdbc://postgresql:/..." URI) I'm getting a weird error as follows:

"

org.apache.spark.sql.AnalysisException: 

The user-specified schema doesn't match the actual schema:

user-specified: `id` INT, (...other columns edited out), actual: `id` INT, (...other columns edited out). If you're using

DataFrameReader.schema API or creating a table, please do not specify the schema.

Or if you're scanning an existed table, please drop it and re-create it.

"

The list of columns were edited out but it is exactly the same list in "user-specified" and in "actual", seems like rather some internal detail is different rather than the printed DDL.

Does this mean that currently there is no way to query AWS Redshift as external data store from Databricks SQL?

Thanks,

Cristian

1 ACCEPTED SOLUTION

Accepted Solutions

pavan_kumar
Contributor

@Cristian Constantinescu​  we see that issue is occurring on the 10.1 version of dbr with 3.2 version of spark and recently we have fixed this issue and the fix will be rolled out soon on all the workspaces.

you can try to use the 10.1 dbr with notebooks and check if the issue is reproducible and for the time you can use the lower versions of dbr with notebooks.

View solution in original post

9 REPLIES 9

pavan_kumar
Contributor

@Cristian Constantinescu​  Thanks for posting your query/issue here.

can you please try to create the table using the jdbc connections in the databricks notebooks using the 9.1 version of DBR and check if you are able to perform it?

cristianc
Contributor

From notebooks the spark sql queries are working as expected.

The problem seems to appear when using the "Databricks SQL" product, for instance when asking for a data sample or when running queries on the external table (that does the JDBC to Redshift via PostgreSQL JDBC URI) from the metastore.

The strangest of the things is that this code seems to be throwing the error:

spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala

if (baseRelation.schema != schema) {

362 throw new AnalysisException(

363 "The user-specified schema doesn't match the actual schema: " +

364 s"user-specified: ${schema.toDDL}, actual: ${baseRelation.schema.toDDL}. If " +

365 "you're using DataFrameReader.schema API or creating a table, please do not " +

366 "specify the schema. Or if you're scanning an existed table, please drop " +

367 "it and re-create it.")

368 }

Meaning baseRelation.schema != schema but schema.toDDL and baseRelation.schema.toDDL is the same string in my case.

Hope this better describes the problem.

Thanks for your interest in solving this issue!

pavan_kumar
Contributor

@Cristian Constantinescu​  we see that issue is occurring on the 10.1 version of dbr with 3.2 version of spark and recently we have fixed this issue and the fix will be rolled out soon on all the workspaces.

you can try to use the 10.1 dbr with notebooks and check if the issue is reproducible and for the time you can use the lower versions of dbr with notebooks.

@Pavan Kumar Chalamcharla​ thank you for your help in understanding better this situation.

BilalAslamDbrx
Honored Contributor II
Honored Contributor II

@Cristian Constantinescu​ can you point me to the documentation page which says DBSQL supports JDBC connections?

cristianc
Contributor

@Bilal Aslam​ there is a high possibility that my understand is not correct, but when I was reading the Databricks SQL guide I stumbled upon this page: https://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-create-table-using.html

In the page it explains how the CREATE TABLE statement works for DBSQL, there is the following section:

"USING data_source The file format to use for the table. data_source must be one of TEXT, AVRO, CSV, JSON, JDBC, PARQUET, ORC, or DELTA. If USING is omitted, the default is DELTA."

In the meantime we discussed with our account rep and we were briefed that JDBC is not supported for DBSQL which is compatible with our experience.

I hope that this information was helpful in any way!

BilalAslamDbrx
Honored Contributor II
Honored Contributor II

@Cristian Constantinescu​ you're right, this is a documentation bug since DBSQL does not [yet] support querying JDBC. I'll work with the team to fix it. Thanks for sharing the doc link!

cristianc
Contributor

@Bilal Aslam​ anytime! Is there a place where customers could follow the timeline when such features are introduced?

BilalAslamDbrx
Honored Contributor II
Honored Contributor II

@Cristian Constantinescu​ I believe that's at ideas.databricks.com

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.