cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Valid init script for installing ODBC Driver 18 for SQL Server to a job cluster

bi_123
New Contributor

I need to execute stored procedures in my notebook, to do that I created an init script that installs ODBC driver to my job cluster. But the script stops working after some time and I can't figure out why, so the cluster can't start. Can someone send me a valid script that installs ODBC driver to a job cluster? I can provide more information if needed.

4 REPLIES 4

amirabedhiafi
New Contributor II

Hello !

I think in your case you need to use a cluster scoped init script and install the MS repo explicitly before installing msodbcsql18  andfor standard or dedicated access mode they should be stored in a UC volume. Don't forget that DBKS runtimes use ubuntu 22.04 so scripts hardcoded for older ubuntu versions often become flaky over time.

If this answer resolves your question, could you please mark it as “Accept as Solution”? It will help other users quickly find the correct fix.

Senior BI/Data Engineer | Microsoft MVP Data Platform | Microsoft MVP Power BI | Power BI Super User | C# Corner MVP

The job cluster I am using is standard access mode and I store the .sh script in a UC volume and access it in the job yaml. I tried using this bash script but I get this error failed: Script exit status is non-zero. I can't find anything else explicitly stating the error. 
#!/bin/bash
# Download the Microsoft signing key
echo "Downloading Microsoft signing key..."
curl https://packages.microsoft.com/keys/microsoft.asc | sudo apt-key add -
# Add the Microsoft repository for ODBC Driver 18
echo "Adding Microsoft repository..."
curl https://packages.microsoft.com/config/ubuntu/$(lsb_release -rs)/prod.list | sudo tee /etc/apt/sources.list.d/mssql-release.list
# Update the package index
echo "Updating package index again..."
sudo apt-get update
# Install the ODBC Driver 18
echo "Installing ODBC Driver 18..."
sudo ACCEPT_EULA=Y apt-get install -y msodbcsql18
# Install oDBC Driver Manager
# echo "Installing ODBC Driver manager ..."
sudo apt-get -y install unixodbc-dev
# Check if the driver is installed
echo "Checking installed driver..."
dpkg -L msodbcsql18
echo "ODBC Driver installation completed successfully."

What you did is very close the only things I’d fix first are the apt-key flow because it is deprecated and you need to add set -euo pipefail because right now the script can fail in the middle without giving you a useful stop point.

Also init scripts run as root so sudo is not needed

If this answer resolves your question, could you please mark it as “Accept as Solution”? It will help other users quickly find the correct fix.

Senior BI/Data Engineer | Microsoft MVP Data Platform | Microsoft MVP Power BI | Power BI Super User | C# Corner MVP

Lu_Wang_ENB_DBX
Databricks Employee
Databricks Employee

You don't need an ODBC driver inside a Databricks job cluster to run stored procedures, and init scripts are fragile enough that they easily break cluster startup.

Options:


Option 1 (Recommended): Use built-in JDBC / connectors instead of ODBC

For SQL Server and most RDBMS, Databricks already ships JDBC-based connectors; you can call stored procedures from notebooks without installing any OS-level driver. For example, SQL Server:

driver = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
host = "<server-host>"
port = "1433"
database = "<db-name>"
user = "<user>"
password = "<password>"

url = f"jdbc:sqlserver://{host}:{port};database={database}"

df = (spark.read
      .format("jdbc")
      .option("driver", driver)
      .option("url", url)
      .option("dbtable", "dbo.SomeTable")  # Or a view wrapping a stored proc
      .option("user", user)
      .option("password", password)
      .load())

On newer runtimes you can also use the sqlserver data source or Lakehouse Federation / remote_query to run native SQL (including EXEC MyProc …) directly against SQL Server, again with no init script.
Recommendation: If your goal is “run stored procedures from a notebook”, switch to these JDBC/federation options instead of ODBC; they’re supported, simpler, and far less brittle.


Option 2: Template cluster-scoped init script to install an ODBC driver

If you truly must use ODBC (for example, a tool library that only supports pyodbc), use a cluster-scoped init script stored in a Workspace file or UC Volume (DBFS init scripts are end-of-life and can prevent clusters from starting).

Example install_odbc.sh (Ubuntu-style driver, adjust URL & bits for your vendor):

#!/bin/bash
set -euxo pipefail

LOG_DIR=/tmp/odbc-init
mkdir -p "$LOG_DIR"

echo "[$(date)] Starting ODBC init script" >> "$LOG_DIR/odbc-init.log"

# 1. System prerequisites
apt-get update
DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
  unixodbc unixodbc-dev curl

# 2. Download & install your vendor ODBC driver (.deb example)
DRIVER_URL="https://<vendor>/path/to/driver_amd64.deb"
DRIVER_DEB="/tmp/driver.deb"

curl -fsSL "$DRIVER_URL" -o "$DRIVER_DEB"
dpkg -i "$DRIVER_DEB" || DEBIAN_FRONTEND=noninteractive apt-get install -f -y

# 3. (Optional) Register driver/DSN
# cat >> /etc/odbcinst.ini <<'EOF'
# [MyVendorDriver]
# Description=My DB ODBC Driver
# Driver=/opt/myvendor/lib/libmyodbc.so
# EOF
#
# cat >> /etc/odbc.ini <<'EOF'
# [MyVendorDSN]
# Driver=MyVendorDriver
# Server=<server>
# Database=<database>
# Port=<port>
# EOF

echo "[$(date)] Finished ODBC init script" >> "$LOG_DIR/odbc-init.log"
exit 0

Key points so the cluster keeps starting:

  • Store the script at a supported path (for example /Workspace/.../install_odbc.sh or /Volumes/<cat>/<sch>/<vol>/install_odbc.sh) and configure it as a cluster-scoped init script on the job’s compute.
  • Keep it small (<64 KB) and make sure it always exits 0 on success; any non-zero code causes the job cluster to fail to start.
  • On Standard access mode clusters, the script path must be allowlisted by a metastore admin, or it will be blocked.
  • If it “stops working after some time”, the usual culprits are:
    • Script stored on DBFS root (now disallowed)
    • Script path no longer on the allowlist after moving to Standard mode
    • Upstream package URL changed and curl/apt-get now fails (check init logs)

You can inspect failures in /databricks/init_scripts/...stdout.log or via cluster log delivery.


Option 3: Use Lakehouse Federation / remote_query to execute procedures

For SQL Server and several other databases, you can create a Unity Catalog connection and then use remote_query to run the database’s native SQL (including stored procedures) from Databricks SQL or notebooks, again without any OS-level driver:

SELECT *
FROM remote_query(
  'my_sqlserver_connection',
  database => 'MyDb',
  query    => 'EXEC dbo.MyStoredProc @Param1 = 42'
);

This uses JDBC under the hood and is the strategic path for cross-system querying.


My recommendation:
Use Option 1 or 3 (JDBC / Lakehouse Federation / remote_query) instead of installing an ODBC driver on the job cluster. Only use Option 2 if you’re blocked by a hard dependency on ODBC, and in that case, adapt the template script above and double-check script location, allowlist, and init-script logs to understand why it started failing.