Lu_Wang_ENB_DBX
Databricks Employee
Databricks Employee

You don't need an ODBC driver inside a Databricks job cluster to run stored procedures, and init scripts are fragile enough that they easily break cluster startup.

Options:


Option 1 (Recommended): Use built-in JDBC / connectors instead of ODBC

For SQL Server and most RDBMS, Databricks already ships JDBC-based connectors; you can call stored procedures from notebooks without installing any OS-level driver. For example, SQL Server:

driver = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
host = "<server-host>"
port = "1433"
database = "<db-name>"
user = "<user>"
password = "<password>"

url = f"jdbc:sqlserver://{host}:{port};database={database}"

df = (spark.read
      .format("jdbc")
      .option("driver", driver)
      .option("url", url)
      .option("dbtable", "dbo.SomeTable")  # Or a view wrapping a stored proc
      .option("user", user)
      .option("password", password)
      .load())

On newer runtimes you can also use the sqlserver data source or Lakehouse Federation / remote_query to run native SQL (including EXEC MyProc …) directly against SQL Server, again with no init script.
Recommendation: If your goal is “run stored procedures from a notebook”, switch to these JDBC/federation options instead of ODBC; they’re supported, simpler, and far less brittle.


Option 2: Template cluster-scoped init script to install an ODBC driver

If you truly must use ODBC (for example, a tool library that only supports pyodbc), use a cluster-scoped init script stored in a Workspace file or UC Volume (DBFS init scripts are end-of-life and can prevent clusters from starting).

Example install_odbc.sh (Ubuntu-style driver, adjust URL & bits for your vendor):

#!/bin/bash
set -euxo pipefail

LOG_DIR=/tmp/odbc-init
mkdir -p "$LOG_DIR"

echo "[$(date)] Starting ODBC init script" >> "$LOG_DIR/odbc-init.log"

# 1. System prerequisites
apt-get update
DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
  unixodbc unixodbc-dev curl

# 2. Download & install your vendor ODBC driver (.deb example)
DRIVER_URL="https://<vendor>/path/to/driver_amd64.deb"
DRIVER_DEB="/tmp/driver.deb"

curl -fsSL "$DRIVER_URL" -o "$DRIVER_DEB"
dpkg -i "$DRIVER_DEB" || DEBIAN_FRONTEND=noninteractive apt-get install -f -y

# 3. (Optional) Register driver/DSN
# cat >> /etc/odbcinst.ini <<'EOF'
# [MyVendorDriver]
# Description=My DB ODBC Driver
# Driver=/opt/myvendor/lib/libmyodbc.so
# EOF
#
# cat >> /etc/odbc.ini <<'EOF'
# [MyVendorDSN]
# Driver=MyVendorDriver
# Server=<server>
# Database=<database>
# Port=<port>
# EOF

echo "[$(date)] Finished ODBC init script" >> "$LOG_DIR/odbc-init.log"
exit 0

Key points so the cluster keeps starting:

  • Store the script at a supported path (for example /Workspace/.../install_odbc.sh or /Volumes/<cat>/<sch>/<vol>/install_odbc.sh) and configure it as a cluster-scoped init script on the job’s compute.
  • Keep it small (<64 KB) and make sure it always exits 0 on success; any non-zero code causes the job cluster to fail to start.
  • On Standard access mode clusters, the script path must be allowlisted by a metastore admin, or it will be blocked.
  • If it “stops working after some time”, the usual culprits are:
    • Script stored on DBFS root (now disallowed)
    • Script path no longer on the allowlist after moving to Standard mode
    • Upstream package URL changed and curl/apt-get now fails (check init logs)

You can inspect failures in /databricks/init_scripts/...stdout.log or via cluster log delivery.


Option 3: Use Lakehouse Federation / remote_query to execute procedures

For SQL Server and several other databases, you can create a Unity Catalog connection and then use remote_query to run the database’s native SQL (including stored procedures) from Databricks SQL or notebooks, again without any OS-level driver:

SELECT *
FROM remote_query(
  'my_sqlserver_connection',
  database => 'MyDb',
  query    => 'EXEC dbo.MyStoredProc @Param1 = 42'
);

This uses JDBC under the hood and is the strategic path for cross-system querying.


My recommendation:
Use Option 1 or 3 (JDBC / Lakehouse Federation / remote_query) instead of installing an ODBC driver on the job cluster. Only use Option 2 if you’re blocked by a hard dependency on ODBC, and in that case, adapt the template script above and double-check script location, allowlist, and init-script logs to understand why it started failing.