cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Query AWS Redshift from Databricks SQL

cristianc
Contributor

Greetings,

In the documentation for Databricks SQL it states that it supports JDBC connections, however when connecting to AWS Redshift via the built in PostgreSQL driver ("CREATE TABLE sample USING JDBC" and "jdbc://postgresql:/..." URI) I'm getting a weird error as follows:

"

org.apache.spark.sql.AnalysisException: 

The user-specified schema doesn't match the actual schema:

user-specified: `id` INT, (...other columns edited out), actual: `id` INT, (...other columns edited out). If you're using

DataFrameReader.schema API or creating a table, please do not specify the schema.

Or if you're scanning an existed table, please drop it and re-create it.

"

The list of columns were edited out but it is exactly the same list in "user-specified" and in "actual", seems like rather some internal detail is different rather than the printed DDL.

Does this mean that currently there is no way to query AWS Redshift as external data store from Databricks SQL?

Thanks,

Cristian

1 ACCEPTED SOLUTION

Accepted Solutions

pavan_kumar
Contributor

@Cristian Constantinescuโ€‹  we see that issue is occurring on the 10.1 version of dbr with 3.2 version of spark and recently we have fixed this issue and the fix will be rolled out soon on all the workspaces.

you can try to use the 10.1 dbr with notebooks and check if the issue is reproducible and for the time you can use the lower versions of dbr with notebooks.

View solution in original post

9 REPLIES 9

pavan_kumar
Contributor

@Cristian Constantinescuโ€‹  Thanks for posting your query/issue here.

can you please try to create the table using the jdbc connections in the databricks notebooks using the 9.1 version of DBR and check if you are able to perform it?

cristianc
Contributor

From notebooks the spark sql queries are working as expected.

The problem seems to appear when using the "Databricks SQL" product, for instance when asking for a data sample or when running queries on the external table (that does the JDBC to Redshift via PostgreSQL JDBC URI) from the metastore.

The strangest of the things is that this code seems to be throwing the error:

spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala

if (baseRelation.schema != schema) {

362 throw new AnalysisException(

363 "The user-specified schema doesn't match the actual schema: " +

364 s"user-specified: ${schema.toDDL}, actual: ${baseRelation.schema.toDDL}. If " +

365 "you're using DataFrameReader.schema API or creating a table, please do not " +

366 "specify the schema. Or if you're scanning an existed table, please drop " +

367 "it and re-create it.")

368 }

Meaning baseRelation.schema != schema but schema.toDDL and baseRelation.schema.toDDL is the same string in my case.

Hope this better describes the problem.

Thanks for your interest in solving this issue!

pavan_kumar
Contributor

@Cristian Constantinescuโ€‹  we see that issue is occurring on the 10.1 version of dbr with 3.2 version of spark and recently we have fixed this issue and the fix will be rolled out soon on all the workspaces.

you can try to use the 10.1 dbr with notebooks and check if the issue is reproducible and for the time you can use the lower versions of dbr with notebooks.

@Pavan Kumar Chalamcharlaโ€‹ thank you for your help in understanding better this situation.

BilalAslamDbrx
Honored Contributor III
Honored Contributor III

@Cristian Constantinescuโ€‹ can you point me to the documentation page which says DBSQL supports JDBC connections?

cristianc
Contributor

@Bilal Aslamโ€‹ there is a high possibility that my understand is not correct, but when I was reading the Databricks SQL guide I stumbled upon this page: https://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-create-table-using.html

In the page it explains how the CREATE TABLE statement works for DBSQL, there is the following section:

"USING data_source The file format to use for the table. data_source must be one of TEXT, AVRO, CSV, JSON, JDBC, PARQUET, ORC, or DELTA. If USING is omitted, the default is DELTA."

In the meantime we discussed with our account rep and we were briefed that JDBC is not supported for DBSQL which is compatible with our experience.

I hope that this information was helpful in any way!

BilalAslamDbrx
Honored Contributor III
Honored Contributor III

@Cristian Constantinescuโ€‹ you're right, this is a documentation bug since DBSQL does not [yet] support querying JDBC. I'll work with the team to fix it. Thanks for sharing the doc link!

cristianc
Contributor

@Bilal Aslamโ€‹ anytime! Is there a place where customers could follow the timeline when such features are introduced?

BilalAslamDbrx
Honored Contributor III
Honored Contributor III

@Cristian Constantinescuโ€‹ I believe that's at ideas.databricks.com

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group