Tuesday
I need to execute stored procedures in my notebook, to do that I created an init script that installs ODBC driver to my job cluster. But the script stops working after some time and I can't figure out why, so the cluster can't start. Can someone send me a valid script that installs ODBC driver to a job cluster? I can provide more information if needed.
Tuesday
Hello !
I think in your case you need to use a cluster scoped init script and install the MS repo explicitly before installing msodbcsql18 andfor standard or dedicated access mode they should be stored in a UC volume. Don't forget that DBKS runtimes use ubuntu 22.04 so scripts hardcoded for older ubuntu versions often become flaky over time.
Tuesday
The job cluster I am using is standard access mode and I store the .sh script in a UC volume and access it in the job yaml. I tried using this bash script but I get this error failed: Script exit status is non-zero. I can't find anything else explicitly stating the error.
#!/bin/bash
# Download the Microsoft signing key
echo "Downloading Microsoft signing key..."
curl https://packages.microsoft.com/keys/microsoft.asc | sudo apt-key add -
# Add the Microsoft repository for ODBC Driver 18
echo "Adding Microsoft repository..."
curl https://packages.microsoft.com/config/ubuntu/$(lsb_release -rs)/prod.list | sudo tee /etc/apt/sources.list.d/mssql-release.list
# Update the package index
echo "Updating package index again..."
sudo apt-get update
# Install the ODBC Driver 18
echo "Installing ODBC Driver 18..."
sudo ACCEPT_EULA=Y apt-get install -y msodbcsql18
# Install oDBC Driver Manager
# echo "Installing ODBC Driver manager ..."
sudo apt-get -y install unixodbc-dev
# Check if the driver is installed
echo "Checking installed driver..."
dpkg -L msodbcsql18
echo "ODBC Driver installation completed successfully."
Tuesday
What you did is very close the only things I’d fix first are the apt-key flow because it is deprecated and you need to add set -euo pipefail because right now the script can fail in the middle without giving you a useful stop point.
Also init scripts run as root so sudo is not needed
Tuesday
You don't need an ODBC driver inside a Databricks job cluster to run stored procedures, and init scripts are fragile enough that they easily break cluster startup.
Options:
For SQL Server and most RDBMS, Databricks already ships JDBC-based connectors; you can call stored procedures from notebooks without installing any OS-level driver. For example, SQL Server:
driver = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
host = "<server-host>"
port = "1433"
database = "<db-name>"
user = "<user>"
password = "<password>"
url = f"jdbc:sqlserver://{host}:{port};database={database}"
df = (spark.read
.format("jdbc")
.option("driver", driver)
.option("url", url)
.option("dbtable", "dbo.SomeTable") # Or a view wrapping a stored proc
.option("user", user)
.option("password", password)
.load())
On newer runtimes you can also use the sqlserver data source or Lakehouse Federation / remote_query to run native SQL (including EXEC MyProc …) directly against SQL Server, again with no init script.
Recommendation: If your goal is “run stored procedures from a notebook”, switch to these JDBC/federation options instead of ODBC; they’re supported, simpler, and far less brittle.
If you truly must use ODBC (for example, a tool library that only supports pyodbc), use a cluster-scoped init script stored in a Workspace file or UC Volume (DBFS init scripts are end-of-life and can prevent clusters from starting).
Example install_odbc.sh (Ubuntu-style driver, adjust URL & bits for your vendor):
#!/bin/bash
set -euxo pipefail
LOG_DIR=/tmp/odbc-init
mkdir -p "$LOG_DIR"
echo "[$(date)] Starting ODBC init script" >> "$LOG_DIR/odbc-init.log"
# 1. System prerequisites
apt-get update
DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
unixodbc unixodbc-dev curl
# 2. Download & install your vendor ODBC driver (.deb example)
DRIVER_URL="https://<vendor>/path/to/driver_amd64.deb"
DRIVER_DEB="/tmp/driver.deb"
curl -fsSL "$DRIVER_URL" -o "$DRIVER_DEB"
dpkg -i "$DRIVER_DEB" || DEBIAN_FRONTEND=noninteractive apt-get install -f -y
# 3. (Optional) Register driver/DSN
# cat >> /etc/odbcinst.ini <<'EOF'
# [MyVendorDriver]
# Description=My DB ODBC Driver
# Driver=/opt/myvendor/lib/libmyodbc.so
# EOF
#
# cat >> /etc/odbc.ini <<'EOF'
# [MyVendorDSN]
# Driver=MyVendorDriver
# Server=<server>
# Database=<database>
# Port=<port>
# EOF
echo "[$(date)] Finished ODBC init script" >> "$LOG_DIR/odbc-init.log"
exit 0
Key points so the cluster keeps starting:
/Workspace/.../install_odbc.sh or /Volumes/<cat>/<sch>/<vol>/install_odbc.sh) and configure it as a cluster-scoped init script on the job’s compute.curl/apt-get now fails (check init logs)You can inspect failures in /databricks/init_scripts/...stdout.log or via cluster log delivery.
remote_query to execute proceduresFor SQL Server and several other databases, you can create a Unity Catalog connection and then use remote_query to run the database’s native SQL (including stored procedures) from Databricks SQL or notebooks, again without any OS-level driver:
SELECT *
FROM remote_query(
'my_sqlserver_connection',
database => 'MyDb',
query => 'EXEC dbo.MyStoredProc @Param1 = 42'
);
This uses JDBC under the hood and is the strategic path for cross-system querying.
My recommendation:
Use Option 1 or 3 (JDBC / Lakehouse Federation / remote_query) instead of installing an ODBC driver on the job cluster. Only use Option 2 if you’re blocked by a hard dependency on ODBC, and in that case, adapt the template script above and double-check script location, allowlist, and init-script logs to understand why it started failing.