cancel
Showing results for 
Search instead for 
Did you mean: 
Warehousing & Analytics
cancel
Showing results for 
Search instead for 
Did you mean: 

AWS Glue and Databricks

dannylee
New Contributor III

Hello, we're receiving an error when running glue jobs to try and connect to and read from a Databricks SQL endpoint.

Hello, we're receiving an error when running glue jobs to try and connect to and read from a Databricks SQL endpoint.
 
An error occurred while calling o104.load. [Databricks][DatabricksJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, infoMessages:[*org.apache.hive.service.cli.HiveSQLException:Configuration dbtable is not available.:48:47, org.apache.spark.sql.hive.thriftserver.HiveThriftServerErrors$:hiveOperatingError:HiveThriftServerErrors.scala:65, com.databricks.sql.hive.thriftserver.thrift.ErrorPropagationThriftHandler:runSafely:ErrorPropagationThriftHandler.scala:124, com.databricks.sql.hive.thriftserver.thrift.ErrorPropagationThriftHandler:ExecuteStatement:ErrorPropagationThriftHandler.scala:73, org.apache.hive.service.cli.thrift.ThriftCLIService:ExecuteStatement:ThriftCLIService.java:429, org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1437, org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1422, org.apache.thrift.ProcessFunction:process:ProcessFunction
 
The same read() query with the same options() works fine if I run it in a local pyspark cluster, but its failing in Glue. I suspect it could be related to the GlueContext - has anyone run across this issue or have an idea what might be causing it?

The same read() query with the same options() works fine if I run it in a local pyspark cluster, but its failing in Glue. I suspect it could be related to the GlueContext - has anyone run across this issue or have an idea what might be causing it?

1 ACCEPTED SOLUTION

Accepted Solutions

dannylee
New Contributor III

Hello @Vidula Khanna​ @Debayan Mukherjee​ ,

I wanted to give you an update that might be helpful for your future customers, we worked with @Pavan Kumar Chalamcharla​ and through lots of trial and error we figured out a combination that works for SQL endpoints and dbtable and Glue 4.0.

The combination will not work for query option or for either dbtable or query in Glue 3.0. We were able to successfully connect and execute a dbtable option (as a subquery):

ex: (SELECT 1) as subq

Also, we were able to use the following options as well:

  • partitionColumn
  • lowerBound
  • upperBound
  • numPartitions

However, I'm not 100% confident that its bugfree and working 100%. The job succeeds and data is loaded, but it feels like its questionable whether the partitioning is happening optimally.

Overall, its good progress! Thanks to @Pavan Kumar Chalamcharla​ for getting us the info we needed to iterate thru the different test cases.

View solution in original post

7 REPLIES 7

Debayan
Esteemed Contributor III
Esteemed Contributor III

Hi, Using dbtable option to accomplish it is not supported when connecting to dbsql endpoint/warehouse. Could you please share some more context here?

Please let us know if this helps. Also, please tag @Debayan​ with your next comment so that I will get notified. Thank you!

dannylee
New Contributor III

@Debayan Mukherjee​ This helps - it was hard to find information on whether it was supported and the error message "Configuration dbtable is not available" was not returning search results.

We are connecting with AWS Glue (Spark) and trying to pull data from an endpoint, previously we attached to a cluster, which worked fine. We tested using sql.connect() and it worked, but had concerns about the code-rework needed and speed/robustness of using the cursor vs jdbc. Any thoughts?

Debayan
Esteemed Contributor III
Esteemed Contributor III

Hi @Danny Lee​ , Unfortunately, Dbtable option is not supported when connecting to dbsql warehouses or endpoints and there is no workaround around it as of now.

Anonymous
Not applicable

Hi @Danny Lee​ 

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

dannylee
New Contributor III

Hi Vidula, still working on this issue. From the enterprise team supporting our source data, we were recommended to try the query keyword with the dbsql endpoint. Not sure if this is working or if the team is aware of the limitations, but still waiting to hear back from the developers.

Thanks for checking in;

dannylee
New Contributor III

Hello @Vidula Khanna​ @Debayan Mukherjee​ ,

I wanted to give you an update that might be helpful for your future customers, we worked with @Pavan Kumar Chalamcharla​ and through lots of trial and error we figured out a combination that works for SQL endpoints and dbtable and Glue 4.0.

The combination will not work for query option or for either dbtable or query in Glue 3.0. We were able to successfully connect and execute a dbtable option (as a subquery):

ex: (SELECT 1) as subq

Also, we were able to use the following options as well:

  • partitionColumn
  • lowerBound
  • upperBound
  • numPartitions

However, I'm not 100% confident that its bugfree and working 100%. The job succeeds and data is loaded, but it feels like its questionable whether the partitioning is happening optimally.

Overall, its good progress! Thanks to @Pavan Kumar Chalamcharla​ for getting us the info we needed to iterate thru the different test cases.

Debayan
Esteemed Contributor III
Esteemed Contributor III

Thanks for the update, we are glad to know it's working.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.