cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Could not connect Self Hosted MySQL Database in Azure Databricks

saadi
New Contributor

Hi,

I am trying to connect a self-hosted MySQL database in Databricks but keep encountering errors.

Database Setup:

  • The MySQL database is hosted on a VM.
  • We use DBeaver or Navicat to query it.
  • Connection to the database requires an active Azure VPN Client session; without the VPN, the connection fails in DBeaver/Navicat.

If anyone has faced a similar issue before and successfully connected MySQL with Databricks, please share the process you followed.

1 ACCEPTED SOLUTION

Accepted Solutions

mark_ott
Databricks Employee
Databricks Employee

To connect a self-hosted MySQL database (on a VM, Azure VPN required) to Databricks, you need several components to align: network access from Databricks to MySQL, proper JDBC connector configuration, and correct authentication. This setup is common in hybrid or secure cloud architectures, but brings specific challenges with VPN and cloud-hosted compute.

Key Issues and Solutions

  • VPN Access Limitation:
    Azure VPN Client provides local access on your machine, but Databricks clusters run in the Azure cloud, not on your local device. They will not inherit your personal VPN tunnel by default, so network access is likely blocked from Databricks to your VM across the VPN.

  • DBeaver/Navicat vs. Databricks:
    GUI tools work when the VPN is active on your machineโ€”they use your PC's network interfaces. Databricks clusters, however, have their own networking which is managed by Azure and do not use your desktop's VPN.

Solution Approaches

1. Allow Databricks VNet to Access MySQL

  • Deploy Databricks in Same VNet (or Peered):
    Place your Databricks workspace in the same Virtual Network (VNet) as your MySQL VM, or use VNet peering between the Databricks VNet and the VMโ€™s VNet for private communication.

  • Configure NSG and Firewalls:
    Make sure that the VM's firewall and any Network Security Group (NSG) allows inbound connections on port 3306 (MySQL) from Databricks' subnet.

  • No Personal VPN Required:
    If Databricks and MySQL are in connected VNets, the VPN is not required for Databricks, as all communication is over Azure's private backbone.

2. Databricks Cluster Library and JDBC

  • Install MySQL JDBC Driver:
    Attach the MySQL JDBC connector to your cluster. Example:

    text
    dbutils.library.installPyPI("mysql-connector-python")
  • Connect with JDBC URL:
    Use a connection string referencing your private VM's IP or hostname, not 'localhost' or a public IP.
    Example:

    python
    jdbcUrl = "jdbc:mysql://<VM_PRIVATE_IP>:3306/<database>?user=<username>&password=<password>" df = spark.read.format("jdbc").option("url", jdbcUrl).option("driver", "com.mysql.cj.jdbc.Driver").load()

3. If You Must Use the VPN

  • VPN Gateway for Azure:
    If you must tunnel traffic through a VPN, set up an Azure VPN Gateway or Azure ExpressRoute and configure the Databricks VNet to route traffic via the gateway. This is a network admin task and often requires subnet and routing tweaks.

  • Jump Host Alternative:
    If direct connection is impossible, create a lightweight bastion/jump host inside the same VNet as the MySQL VM, and allow Databricks to SSH-tunnel through this host (SSH tunneling in Databricks is more advanced and not always recommended for production).

4. Verify Connection

  • Test network connectivity from Databricks using shell commands:

    text
    %sh nc -vz <VM_PRIVATE_IP> 3306

    If this fails, networking/routing/firewall are not set up correctly.

Summary Table

Method Simpler for Testing Production-Ready Needs Network Admin Support
VNet Peering or Same VNet Yes Yes Yes
Dedicated VPN Gateway for Workspace No Yes Yes
Jump Host/SSH Tunnel (Advanced) With effort Risky Yes
 
 

References

  • Your exact approach depends on your organization's Azure permissions and policies, but VNets and proper firewall/NAT routing are essential to allow Databricks to reach your MySQL VM if connectivity must remain private and secure.

View solution in original post

1 REPLY 1

mark_ott
Databricks Employee
Databricks Employee

To connect a self-hosted MySQL database (on a VM, Azure VPN required) to Databricks, you need several components to align: network access from Databricks to MySQL, proper JDBC connector configuration, and correct authentication. This setup is common in hybrid or secure cloud architectures, but brings specific challenges with VPN and cloud-hosted compute.

Key Issues and Solutions

  • VPN Access Limitation:
    Azure VPN Client provides local access on your machine, but Databricks clusters run in the Azure cloud, not on your local device. They will not inherit your personal VPN tunnel by default, so network access is likely blocked from Databricks to your VM across the VPN.

  • DBeaver/Navicat vs. Databricks:
    GUI tools work when the VPN is active on your machineโ€”they use your PC's network interfaces. Databricks clusters, however, have their own networking which is managed by Azure and do not use your desktop's VPN.

Solution Approaches

1. Allow Databricks VNet to Access MySQL

  • Deploy Databricks in Same VNet (or Peered):
    Place your Databricks workspace in the same Virtual Network (VNet) as your MySQL VM, or use VNet peering between the Databricks VNet and the VMโ€™s VNet for private communication.

  • Configure NSG and Firewalls:
    Make sure that the VM's firewall and any Network Security Group (NSG) allows inbound connections on port 3306 (MySQL) from Databricks' subnet.

  • No Personal VPN Required:
    If Databricks and MySQL are in connected VNets, the VPN is not required for Databricks, as all communication is over Azure's private backbone.

2. Databricks Cluster Library and JDBC

  • Install MySQL JDBC Driver:
    Attach the MySQL JDBC connector to your cluster. Example:

    text
    dbutils.library.installPyPI("mysql-connector-python")
  • Connect with JDBC URL:
    Use a connection string referencing your private VM's IP or hostname, not 'localhost' or a public IP.
    Example:

    python
    jdbcUrl = "jdbc:mysql://<VM_PRIVATE_IP>:3306/<database>?user=<username>&password=<password>" df = spark.read.format("jdbc").option("url", jdbcUrl).option("driver", "com.mysql.cj.jdbc.Driver").load()

3. If You Must Use the VPN

  • VPN Gateway for Azure:
    If you must tunnel traffic through a VPN, set up an Azure VPN Gateway or Azure ExpressRoute and configure the Databricks VNet to route traffic via the gateway. This is a network admin task and often requires subnet and routing tweaks.

  • Jump Host Alternative:
    If direct connection is impossible, create a lightweight bastion/jump host inside the same VNet as the MySQL VM, and allow Databricks to SSH-tunnel through this host (SSH tunneling in Databricks is more advanced and not always recommended for production).

4. Verify Connection

  • Test network connectivity from Databricks using shell commands:

    text
    %sh nc -vz <VM_PRIVATE_IP> 3306

    If this fails, networking/routing/firewall are not set up correctly.

Summary Table

Method Simpler for Testing Production-Ready Needs Network Admin Support
VNet Peering or Same VNet Yes Yes Yes
Dedicated VPN Gateway for Workspace No Yes Yes
Jump Host/SSH Tunnel (Advanced) With effort Risky Yes
 
 

References

  • Your exact approach depends on your organization's Azure permissions and policies, but VNets and proper firewall/NAT routing are essential to allow Databricks to reach your MySQL VM if connectivity must remain private and secure.