cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Set up connection to on prem sql server

Silabs
New Contributor

Ive just set up our databricks environment. Hosted in AWS. 

We have an on prem SQL server and would like to connect . 

How can i do that?

1 ACCEPTED SOLUTION

Accepted Solutions

Yeshwanth
Honored Contributor
Honored Contributor

@Silabs good day!

To connect your Databricks environment (hosted on AWS) to your on-premise SQL server, follow these steps:

1. Network Setup: Establish a connection between your SQL server and the Databricks virtual private cloud (VPC) using VPN or AWS Direct Connect. Test the network connectivity by running the command inside a Databricks notebook: %sh nc -vz <SQL Server IP or hostname> <port>. Replace <SQL Server IP or hostname> and <port> with your SQL Server's details.

2. Install Databricks SQL Connector: On your development machine, install the Databricks SQL Connector for Python library using either pip install databricks-sql-connector or python -m pip install databricks-sql-connector.

3. Gather Connection Information: Collect the following information for the Databricks cluster or SQL warehouse: the server hostname of the cluster (available in the Advanced Options > JDBC/ODBC tab for your cluster) and the HTTP path of the cluster (also available in the Advanced Options > JDBC/ODBC tab).

4. Configure a Connection to SQL Server: Use the Databricks SQL Connector and the previously gathered connection information to configure a connection to your SQL Server. Ensure the user or service principal connecting to the Databricks cluster or SQL warehouse has the necessary permissions (CAN ATTACH TO, CAN RESTART, and CAN USE).

I hope this helps. Please let us know if you have any questions or concerns.

Kind regards,

Yesh

View solution in original post

3 REPLIES 3

-werners-
Esteemed Contributor III

The connection itself can be done using ODBC/JDBC, but that is not the issue.
What you need to achieve is allowing AWS to connect to the on-prem server.
On AWS you can do that using PrivateLink and a VPC Endpoint, or use a VPN.
There might be more though (haven't worked in AWS for a while).

Cary
New Contributor III
New Contributor III

Two other things we would generally recommend are 1.) Direct Connect and 2.) Using AWS DMS, or any other CDC tool set.  There are lots.  We have Query Federation now in the Unity Catalog, so you could set up the connection that way, but JDBC/ODBC connections are prone to errors / dropped connections, so for production data engineering tasks its better to load into S3 using a CDC tool first and use Databricks to load the data after.  If that's too heavy of a lift, then its pretty easy to use pyodbc and the sql server driver on a cluster once you setup the network access.

Yeshwanth
Honored Contributor
Honored Contributor

@Silabs good day!

To connect your Databricks environment (hosted on AWS) to your on-premise SQL server, follow these steps:

1. Network Setup: Establish a connection between your SQL server and the Databricks virtual private cloud (VPC) using VPN or AWS Direct Connect. Test the network connectivity by running the command inside a Databricks notebook: %sh nc -vz <SQL Server IP or hostname> <port>. Replace <SQL Server IP or hostname> and <port> with your SQL Server's details.

2. Install Databricks SQL Connector: On your development machine, install the Databricks SQL Connector for Python library using either pip install databricks-sql-connector or python -m pip install databricks-sql-connector.

3. Gather Connection Information: Collect the following information for the Databricks cluster or SQL warehouse: the server hostname of the cluster (available in the Advanced Options > JDBC/ODBC tab for your cluster) and the HTTP path of the cluster (also available in the Advanced Options > JDBC/ODBC tab).

4. Configure a Connection to SQL Server: Use the Databricks SQL Connector and the previously gathered connection information to configure a connection to your SQL Server. Ensure the user or service principal connecting to the Databricks cluster or SQL warehouse has the necessary permissions (CAN ATTACH TO, CAN RESTART, and CAN USE).

I hope this helps. Please let us know if you have any questions or concerns.

Kind regards,

Yesh

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!