โ05-08-2023 12:39 PM
Hello, we're receiving an error when running glue jobs to try and connect to and read from a Databricks SQL endpoint.
Hello, we're receiving an error when running glue jobs to try and connect to and read from a Databricks SQL endpoint.
An error occurred while calling o104.load. [Databricks][DatabricksJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, infoMessages:[*org.apache.hive.service.cli.HiveSQLException:Configuration dbtable is not available.:48:47, org.apache.spark.sql.hive.thriftserver.HiveThriftServerErrors$:hiveOperatingError:HiveThriftServerErrors.scala:65, com.databricks.sql.hive.thriftserver.thrift.ErrorPropagationThriftHandler:runSafely:ErrorPropagationThriftHandler.scala:124, com.databricks.sql.hive.thriftserver.thrift.ErrorPropagationThriftHandler:ExecuteStatement:ErrorPropagationThriftHandler.scala:73, org.apache.hive.service.cli.thrift.ThriftCLIService:ExecuteStatement:ThriftCLIService.java:429, org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1437, org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1422, org.apache.thrift.ProcessFunction:process:ProcessFunction
The same read() query with the same options() works fine if I run it in a local pyspark cluster, but its failing in Glue. I suspect it could be related to the GlueContext - has anyone run across this issue or have an idea what might be causing it?
The same read() query with the same options() works fine if I run it in a local pyspark cluster, but its failing in Glue. I suspect it could be related to the GlueContext - has anyone run across this issue or have an idea what might be causing it?
โ06-06-2023 06:57 AM
Hello @Vidula Khannaโ @Debayan Mukherjeeโ ,
I wanted to give you an update that might be helpful for your future customers, we worked with @Pavan Kumar Chalamcharlaโ and through lots of trial and error we figured out a combination that works for SQL endpoints and dbtable and Glue 4.0.
The combination will not work for query option or for either dbtable or query in Glue 3.0. We were able to successfully connect and execute a dbtable option (as a subquery):
ex: (SELECT 1) as subq
Also, we were able to use the following options as well:
However, I'm not 100% confident that its bugfree and working 100%. The job succeeds and data is loaded, but it feels like its questionable whether the partitioning is happening optimally.
Overall, its good progress! Thanks to @Pavan Kumar Chalamcharlaโ for getting us the info we needed to iterate thru the different test cases.
โ05-10-2023 01:08 AM
Hi, Using dbtable option to accomplish it is not supported when connecting to dbsql endpoint/warehouse. Could you please share some more context here?
Please let us know if this helps. Also, please tag @Debayanโ with your next comment so that I will get notified. Thank you!
โ05-10-2023 01:30 PM
@Debayan Mukherjeeโ This helps - it was hard to find information on whether it was supported and the error message "Configuration dbtable is not available" was not returning search results.
We are connecting with AWS Glue (Spark) and trying to pull data from an endpoint, previously we attached to a cluster, which worked fine. We tested using sql.connect() and it worked, but had concerns about the code-rework needed and speed/robustness of using the cursor vs jdbc. Any thoughts?
โ05-10-2023 09:53 PM
Hi @Danny Leeโ , Unfortunately, Dbtable option is not supported when connecting to dbsql warehouses or endpoints and there is no workaround around it as of now.
โ05-19-2023 11:39 PM
Hi @Danny Leeโ
Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help.
We'd love to hear from you.
Thanks!
โ05-20-2023 05:44 PM
Hi Vidula, still working on this issue. From the enterprise team supporting our source data, we were recommended to try the query keyword with the dbsql endpoint. Not sure if this is working or if the team is aware of the limitations, but still waiting to hear back from the developers.
Thanks for checking in;
โ06-06-2023 06:57 AM
Hello @Vidula Khannaโ @Debayan Mukherjeeโ ,
I wanted to give you an update that might be helpful for your future customers, we worked with @Pavan Kumar Chalamcharlaโ and through lots of trial and error we figured out a combination that works for SQL endpoints and dbtable and Glue 4.0.
The combination will not work for query option or for either dbtable or query in Glue 3.0. We were able to successfully connect and execute a dbtable option (as a subquery):
ex: (SELECT 1) as subq
Also, we were able to use the following options as well:
However, I'm not 100% confident that its bugfree and working 100%. The job succeeds and data is loaded, but it feels like its questionable whether the partitioning is happening optimally.
Overall, its good progress! Thanks to @Pavan Kumar Chalamcharlaโ for getting us the info we needed to iterate thru the different test cases.
โ06-14-2023 11:31 PM
Thanks for the update, we are glad to know it's working.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group