cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

ODBC Connection to Another Compute Within the Same Workspace

JJ_
New Contributor II

Hello all!

I couldn't find anything definitive related to this issue so I hope I'm not duplicating another topic :).

I have imported an R repository that normally runs on another machine and uses ODBC driver to issue sparkSQL commands to a compute (let's call it main compute). No issues there, everything works flawlessly.

Now I would like to turn that repo into a databricks-hosted Shiny app so we've created another compute for hosting it. I tried to use the same ODBC connection string to send SQL from the app's compute to the main compute but it fails (we're talking about the same workspace). Currently rewriting this code is not an option.

The error I get (both in R and isql) is:

Error from ThriftHiveClient: No more data to read

Sometimes with isql command I was getting SASL errors.

I tried many things:

  • Using both preinstalled drivers and installing manually one from the download side
  • Connecting to localhost
  • Experimenting with various connection string parameters
  • Customising the odbc.ini and odbcinst.init files
  • Using the app's compute as the target of the ODBC connection

In theory, such a scenario should work, in the worst case I should be able to achieve inefficient communication between clusters. My admin took a look at the networking stuff but couldn't find anything problematic (although this is a new scenario for him).

Is there anything additional that is required for such a scenario to work? I'd appreciate any input! Thank you!

3 REPLIES 3

Anonymous
Not applicable

@Jarek Kupisz​ :

It is possible to connect to a compute within the same workspace using ODBC. However, there are some things you need to consider.

Firstly, make sure that the ODBC driver you are using is compatible with the version of Databricks you are running. You can check this in the Databricks documentation.

Secondly, make sure that you have the necessary permissions to access the compute you are trying to connect to. You may need to configure the firewall settings to allow connections between the computes.

Thirdly, check that the hostname or IP address you are using to connect to the compute is correct. You can use the hostname command to get the hostname of the compute you are trying to connect to.

Finally, try testing the ODBC connection using a different tool or client to rule out any issues with the ODBC driver or configuration. You can use the isql command to test the ODBC connection from the command line.

JJ_
New Contributor II

Thanks @Suteja Kanuri​ for your response! I tried all of the steps you mentioned (and many more) but never managed to make it work.

My suspicion was that our azure networking setup was preventing this from happening. I have not found this documented anywhere, but my second guess would be that the ODBC driver on a compute is only capable of receiving ODBC commands from non-databricks machines (not sure if this is something you can confirm with your dev team).

I've settled on a solution that posts a data bricks job to achieve a cluster to cluster communication. Nevertheless, if there is a way to do it I'd like to hear about it!

Anonymous
Not applicable

@Jarek Kupisz​ :

It's possible that the Azure networking setup is causing the issue, as it could be blocking the ODBC traffic between the two clusters.

As for your second guess, the ODBC driver on a Databricks compute should be capable of receiving ODBC commands from other Databricks clusters, as long as the necessary network configurations and security permissions are in place. However, there may be specific configurations or limitations that are not documented publicly, so it would be best to check with the Databricks support team to confirm.

The solution you've settled on, which posts a Databricks job for cluster to cluster communication, is also a valid approach. It may not be the most efficient or optimal, but it can work reliably and securely.

If you do want to explore other options or try to get the ODBC connection working between the Databricks clusters, I recommend reaching out to Databricks support for assistance.

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!