Databricks Community

dannylee · ‎05-08-2023

Hello, we're receiving an error when running glue jobs to try and connect to and read from a Databricks SQL endpoint.

Hello, we're receiving an error when running glue jobs to try and connect to and read from a Databricks SQL endpoint.
 
An error occurred while calling o104.load. [Databricks][DatabricksJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, infoMessages:[*org.apache.hive.service.cli.HiveSQLException:Configuration dbtable is not available.:48:47, org.apache.spark.sql.hive.thriftserver.HiveThriftServerErrors$:hiveOperatingError:HiveThriftServerErrors.scala:65, com.databricks.sql.hive.thriftserver.thrift.ErrorPropagationThriftHandler:runSafely:ErrorPropagationThriftHandler.scala:124, com.databricks.sql.hive.thriftserver.thrift.ErrorPropagationThriftHandler:ExecuteStatement:ErrorPropagationThriftHandler.scala:73, org.apache.hive.service.cli.thrift.ThriftCLIService:ExecuteStatement:ThriftCLIService.java:429, org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1437, org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1422, org.apache.thrift.ProcessFunction:process:ProcessFunction
 
The same read() query with the same options() works fine if I run it in a local pyspark cluster, but its failing in Glue. I suspect it could be related to the GlueContext - has anyone run across this issue or have an idea what might be causing it?

The same read() query with the same options() works fine if I run it in a local pyspark cluster, but its failing in Glue. I suspect it could be related to the GlueContext - has anyone run across this issue or have an idea what might be causing it?

dannylee · ‎06-06-2023

Hello @Vidula Khanna @Debayan Mukherjee ,

I wanted to give you an update that might be helpful for your future customers, we worked with @Pavan Kumar Chalamcharla and through lots of trial and error we figured out a combination that works for SQL endpoints and dbtable and Glue 4.0.

The combination will not work for query option or for either dbtable or query in Glue 3.0. We were able to successfully connect and execute a dbtable option (as a subquery):

ex: (SELECT 1) as subq

Also, we were able to use the following options as well:

partitionColumn
lowerBound
upperBound
numPartitions

However, I'm not 100% confident that its bugfree and working 100%. The job succeeds and data is loaded, but it feels like its questionable whether the partitioning is happening optimally.

Overall, its good progress! Thanks to @Pavan Kumar Chalamcharla for getting us the info we needed to iterate thru the different test cases.

View solution in original post

Debayan · ‎05-10-2023

Hi, Using dbtable option to accomplish it is not supported when connecting to dbsql endpoint/warehouse. Could you please share some more context here?

Please let us know if this helps. Also, please tag @Debayan with your next comment so that I will get notified. Thank you!

dannylee · ‎05-10-2023

@Debayan Mukherjee This helps - it was hard to find information on whether it was supported and the error message "Configuration dbtable is not available" was not returning search results.

We are connecting with AWS Glue (Spark) and trying to pull data from an endpoint, previously we attached to a cluster, which worked fine. We tested using sql.connect() and it worked, but had concerns about the code-rework needed and speed/robustness of using the cursor vs jdbc. Any thoughts?

Debayan · ‎05-10-2023

Hi @Danny Lee , Unfortunately, Dbtable option is not supported when connecting to dbsql warehouses or endpoints and there is no workaround around it as of now.

Anonymous · ‎05-19-2023

Hi @Danny Lee

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help.

We'd love to hear from you.

Thanks!

dannylee · ‎05-20-2023

Hi Vidula, still working on this issue. From the enterprise team supporting our source data, we were recommended to try the query keyword with the dbsql endpoint. Not sure if this is working or if the team is aware of the limitations, but still waiting to hear back from the developers.

Thanks for checking in;

dannylee · ‎06-06-2023