Databricks Community

laurenskuiper97 · ‎04-16-2024

Hi everybody,

I'm trying to setup a connection between Databricks' Notebooks and an external PostgreSQL database through a SSH-tunnel. On a single-node cluster, this is working perfectly fine. However, when this is ran on a multi-node cluster, this code fails. I assume it must be because of the localhost reference in the JDBC url which is interpreted by the executors on a multi-node cluster. However, inserting the hostname of the driver in that part of the URL with something like socket.getHostname() is not working as well, as was explained here: https://stackoverflow.com/questions/68278369/how-to-use-pyspark-to-read-a-mysql-database-using-a-ssh...

Does anyone know how to make this work on multi-node clusters as well?

# Set up SSH tunnel
with SSHTunnelForwarder(
    ("data.host", 22),
    ssh_username="databricks",
    ssh_pkey=paramiko.RSAKey.from_private_key(private_key),
    remote_bind_address=("127.0.0.1", 5432),
    local_bind_address=("localhost", 5432)
) as tunnel:
    print(tunnel.is_active)
    jdbc_url = f"jdbc:postgresql://localhost:{tunnel.local_bind_port}/{database}"
    print(jdbc_url)
    df = spark.read \
        .format("jdbc") \
        .option("url", jdbc_url) \
        .option("user", username) \
        .option("password", password) \
        .option("driver", driver) \
        .option("dbtable", f"(SELECT * FROM schema_test.table_test) as tmp") \
        .load()
    display(df)

-werners- · ‎04-17-2024

I doubt it is possible.
The driver runs the program, and sends tasks to the executors. But since creating the ssh tunnel is no spark task, I don't think it will be established on any executor.

Databricks Community

JDBC / SSH-tunnel to connect to PostgreSQL not working on multi-node clusters

Connect with Databricks Users in Your Area

Databricks Learning Festival (Virtual): 15 January - 31 January 2025

Milestone: DatabricksTV Reaches 100 Videos!

Announcing the new Meta Llama 3.3 model on Databricks

Databricks Community Champion - December 2024 - Sujesh Menon

Dotmatics and Databricks Partner to Advance Scientific Intelligence in Life Sciences