- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-21-2026 09:12 AM
You don't need an ODBC driver inside a Databricks job cluster to run stored procedures, and init scripts are fragile enough that they easily break cluster startup.
Options:
Option 1 (Recommended): Use built-in JDBC / connectors instead of ODBC
For SQL Server and most RDBMS, Databricks already ships JDBC-based connectors; you can call stored procedures from notebooks without installing any OS-level driver. For example, SQL Server:
driver = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
host = "<server-host>"
port = "1433"
database = "<db-name>"
user = "<user>"
password = "<password>"
url = f"jdbc:sqlserver://{host}:{port};database={database}"
df = (spark.read
.format("jdbc")
.option("driver", driver)
.option("url", url)
.option("dbtable", "dbo.SomeTable") # Or a view wrapping a stored proc
.option("user", user)
.option("password", password)
.load())
On newer runtimes you can also use the sqlserver data source or Lakehouse Federation / remote_query to run native SQL (including EXEC MyProc …) directly against SQL Server, again with no init script.
Recommendation: If your goal is “run stored procedures from a notebook”, switch to these JDBC/federation options instead of ODBC; they’re supported, simpler, and far less brittle.
Option 2: Template cluster-scoped init script to install an ODBC driver
If you truly must use ODBC (for example, a tool library that only supports pyodbc), use a cluster-scoped init script stored in a Workspace file or UC Volume (DBFS init scripts are end-of-life and can prevent clusters from starting).
Example install_odbc.sh (Ubuntu-style driver, adjust URL & bits for your vendor):
#!/bin/bash
set -euxo pipefail
LOG_DIR=/tmp/odbc-init
mkdir -p "$LOG_DIR"
echo "[$(date)] Starting ODBC init script" >> "$LOG_DIR/odbc-init.log"
# 1. System prerequisites
apt-get update
DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
unixodbc unixodbc-dev curl
# 2. Download & install your vendor ODBC driver (.deb example)
DRIVER_URL="https://<vendor>/path/to/driver_amd64.deb"
DRIVER_DEB="/tmp/driver.deb"
curl -fsSL "$DRIVER_URL" -o "$DRIVER_DEB"
dpkg -i "$DRIVER_DEB" || DEBIAN_FRONTEND=noninteractive apt-get install -f -y
# 3. (Optional) Register driver/DSN
# cat >> /etc/odbcinst.ini <<'EOF'
# [MyVendorDriver]
# Description=My DB ODBC Driver
# Driver=/opt/myvendor/lib/libmyodbc.so
# EOF
#
# cat >> /etc/odbc.ini <<'EOF'
# [MyVendorDSN]
# Driver=MyVendorDriver
# Server=<server>
# Database=<database>
# Port=<port>
# EOF
echo "[$(date)] Finished ODBC init script" >> "$LOG_DIR/odbc-init.log"
exit 0
Key points so the cluster keeps starting:
- Store the script at a supported path (for example
/Workspace/.../install_odbc.shor/Volumes/<cat>/<sch>/<vol>/install_odbc.sh) and configure it as a cluster-scoped init script on the job’s compute. - Keep it small (<64 KB) and make sure it always exits 0 on success; any non-zero code causes the job cluster to fail to start.
- On Standard access mode clusters, the script path must be allowlisted by a metastore admin, or it will be blocked.
- If it “stops working after some time”, the usual culprits are:
- Script stored on DBFS root (now disallowed)
- Script path no longer on the allowlist after moving to Standard mode
- Upstream package URL changed and
curl/apt-getnow fails (check init logs)
You can inspect failures in /databricks/init_scripts/...stdout.log or via cluster log delivery.
Option 3: Use Lakehouse Federation / remote_query to execute procedures
For SQL Server and several other databases, you can create a Unity Catalog connection and then use remote_query to run the database’s native SQL (including stored procedures) from Databricks SQL or notebooks, again without any OS-level driver:
SELECT *
FROM remote_query(
'my_sqlserver_connection',
database => 'MyDb',
query => 'EXEC dbo.MyStoredProc @Param1 = 42'
);
This uses JDBC under the hood and is the strategic path for cross-system querying.
My recommendation:
Use Option 1 or 3 (JDBC / Lakehouse Federation / remote_query) instead of installing an ODBC driver on the job cluster. Only use Option 2 if you’re blocked by a hard dependency on ODBC, and in that case, adapt the template script above and double-check script location, allowlist, and init-script logs to understand why it started failing.