Databricks Community

laurenskuiper97 · ‎04-16-2024

Hi everybody,

I'm trying to setup a connection between Databricks' Notebooks and an external PostgreSQL database through a SSH-tunnel. On a single-node cluster, this is working perfectly fine. However, when this is ran on a multi-node cluster, this code fails. I assume it must be because of the localhost reference in the JDBC url which is interpreted by the executors on a multi-node cluster. However, inserting the hostname of the driver in that part of the URL with something like socket.getHostname() is not working as well, as was explained here: https://stackoverflow.com/questions/68278369/how-to-use-pyspark-to-read-a-mysql-database-using-a-ssh...

Does anyone know how to make this work on multi-node clusters as well?

# Set up SSH tunnel
with SSHTunnelForwarder(
    ("data.host", 22),
    ssh_username="databricks",
    ssh_pkey=paramiko.RSAKey.from_private_key(private_key),
    remote_bind_address=("127.0.0.1", 5432),
    local_bind_address=("localhost", 5432)
) as tunnel:
    print(tunnel.is_active)
    jdbc_url = f"jdbc:postgresql://localhost:{tunnel.local_bind_port}/{database}"
    print(jdbc_url)
    df = spark.read \
        .format("jdbc") \
        .option("url", jdbc_url) \
        .option("user", username) \
        .option("password", password) \
        .option("driver", driver) \
        .option("dbtable", f"(SELECT * FROM schema_test.table_test) as tmp") \
        .load()
    display(df)

-werners- · ‎04-17-2024

I doubt it is possible.
The driver runs the program, and sends tasks to the executors. But since creating the ssh tunnel is no spark task, I don't think it will be established on any executor.

Databricks Community

JDBC / SSH-tunnel to connect to PostgreSQL not working on multi-node clusters

Join Us as a Local Community Builder!

Solution Accelerator Series | #5 - Automating Product Review Summarization with LLMs

The next BrickTalks about the latest and greatest in AI/BI is scheduled for Oct 28!

🚀 Weekly Delta (8 - 14 October): A Look Back at This Week’s Top Community Highlights

BrickCon 2025 — Dec 3–5 | A Community Conference for Databricks Builders

🌟 Community Sparks of the Week | September 26 – October 2 🌟