cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

ODBC driver installation - help needed

DylanStout
Contributor

Hello, 

I’m trying to use pyodbc inside Databricks to connect to a SQL Server database, but I’m working in a restricted, offline Databricks workspace (no outbound internet).

What I’ve learned so far:

  • Databricks clusters do not include Microsoft’s ODBC Driver 17 or 18 by default.

  • I downloaded the .deb package manually:

    msodbcsql17_17.10.6.1-1_amd64.deb

    and uploaded it to:

    /Workspace/Users/<user>/odbc/
  • When I try to install it from a .py script using dpkg -i, it fails because:

    • Python jobs run as non-root, so dpkg → “requires superuser privilege”
    • Python jobs run on the driver only, not on executors
    • Installation would not persist across cluster restarts anyway

So my real goal is:

Install ODBC Driver 17 on all cluster nodes, offline, with no internet, and enable pyodbc to connect to SQL Server from Databricks.

From what I understand, the correct approach is:

  • Use an init script that installs the .deb file at cluster startup (runs as root),
  • Possibly install additional dependency .deb packages (libodbc1, unixodbc, odbcinst1debian2, etc.),

I’m looking for guidance from anyone who has successfully done an offline ODBC driver installation in Databricks.

Currently I am running this shell as init script:

#!/bin/bash
# install_msodbcsql17_offline.sh

# Where you uploaded the packages
PKG_DIR="/Workspace/Users/<me>/odbc"

# Install msodbcsql17 from local .deb
dpkg -i "${PKG_DIR}/msodbcsql17_17.10.6.1-1_amd64.deb" || true

 

However, this script never completes when the cluster is starting (stuck on "Running Init Scripts")

1 REPLY 1

SteveOstrowski
Databricks Employee
Databricks Employee

Hi @DylanStout,

This is worth walking through carefully. It sounds like you have already done solid research on the constraints. Let me walk you through the most likely reason your init script is hanging and provide a complete working approach for offline ODBC driver installation.


WHY YOUR INIT SCRIPT IS HANGING

The most common reason an init script gets stuck on "Running Init Scripts" when installing the Microsoft ODBC driver .deb package is the EULA (End User License Agreement) prompt. When you run dpkg -i on the msodbcsql17 package, it triggers a debconf prompt asking you to accept the Microsoft EULA. Since init scripts run non-interactively, that prompt blocks forever waiting for input, which is exactly the "stuck" behavior you are seeing.

The "|| true" at the end of your dpkg command prevents it from returning a non-zero exit code (which would fail the cluster), but it does not prevent the interactive prompt from hanging the process.


THE FIX: ACCEPT THE EULA NON-INTERACTIVELY

You need to pre-accept the EULA before running dpkg. There are two ways to do this:

Option A: Use the ACCEPT_EULA environment variable

export ACCEPT_EULA=Y
dpkg -i "${PKG_DIR}/msodbcsql17_17.10.6.1-1_amd64.deb"

Option B: Use debconf-set-selections to pre-seed the answer

echo "msodbcsql17 msodbcsql/ACCEPT_EULA boolean true" | debconf-set-selections
dpkg -i "${PKG_DIR}/msodbcsql17_17.10.6.1-1_amd64.deb"


COMPLETE WORKING INIT SCRIPT FOR OFFLINE INSTALLATION

Here is a complete init script that handles the EULA, dependencies, and error cases. Upload all required .deb files to a Unity Catalog Volume (recommended) or Workspace Files location, then reference them in the script.

Step 1: Gather the required .deb packages

The msodbcsql17 package depends on unixodbc (which itself depends on libodbc2, libodbcinst2, and odbcinst). On an Ubuntu-based Databricks Runtime, the unixodbc package and its sub-dependencies may already be present on the image, but in a fully offline environment you should have them available just in case. The packages you need are:

- msodbcsql17_17.10.6.1-1_amd64.deb (the driver itself)
- unixodbc_*.deb (if not already installed)
- libodbc2_*.deb (dependency of unixodbc, if needed)
- libodbcinst2_*.deb (dependency of unixodbc, if needed)

You can check which are already present by running "dpkg -l | grep unixodbc" in a notebook on a running cluster to see what is pre-installed.

Step 2: Upload the packages

Upload them to a Unity Catalog Volume, for example:

/Volumes/my_catalog/my_schema/my_volume/odbc/

Or to Workspace Files:

/Workspace/Users/<your-user>/odbc/

Step 3: Create the init script

#!/bin/bash
set -e

# Path where you uploaded the .deb packages
PKG_DIR="/Volumes/my_catalog/my_schema/my_volume/odbc"
# If using Workspace Files instead, use:
# PKG_DIR="/Workspace/Users/<your-user>/odbc"

# Pre-accept the Microsoft EULA (this prevents the interactive hang)
export ACCEPT_EULA=Y

# Install dependencies first if they are not already present
if ! dpkg -s unixodbc > /dev/null 2>&1; then
echo "Installing unixODBC dependencies..."
dpkg -i "${PKG_DIR}"/libodbc2_*.deb || true
dpkg -i "${PKG_DIR}"/libodbcinst2_*.deb || true
dpkg -i "${PKG_DIR}"/unixodbc_*.deb || true
fi

# Install the Microsoft ODBC Driver 17
echo "Installing msodbcsql17..."
dpkg -i "${PKG_DIR}/msodbcsql17_17.10.6.1-1_amd64.deb"

# Verify the installation
odbcinst -q -d -n "ODBC Driver 17 for SQL Server"
echo "ODBC Driver 17 installed successfully."

Step 4: Configure the init script on your cluster

1. Go to your cluster configuration
2. Click Advanced Options
3. Go to the Init Scripts tab
4. Select your source (Volumes or Workspace) and enter the path to the script
5. Click Add, then Confirm and Restart


IMPORTANT NOTES

1. Init scripts run as root, so you do not need sudo. The "requires superuser privilege" error you saw was from trying to run dpkg inside a Python notebook cell, which runs as a non-root user. Init scripts do not have this problem.

2. Init scripts run on ALL nodes (both driver and workers), so pyodbc will be available everywhere.

3. The installation does not persist across cluster restarts, which is expected. The init script runs every time the cluster starts, so this is handled automatically.

4. Make sure the path to your packages is accessible from all nodes. Unity Catalog Volumes (recommended for Databricks Runtime 13.3 LTS and above) and Workspace Files are both accessible from all nodes during init script execution.


ALTERNATIVE: CONSIDER USING JDBC INSTEAD

If your goal is simply to read data from SQL Server into Databricks DataFrames, you may not need ODBC at all. Databricks has built-in JDBC support that does not require any additional driver installation. The Microsoft SQL Server JDBC driver is included in the Databricks Runtime by default. Here is an example:

jdbc_url = "jdbc:sqlserver://<server>:<port>;databaseName=<database>"

df = (spark.read
.format("jdbc")
.option("url", jdbc_url)
.option("dbtable", "<schema.table>")
.option("user", dbutils.secrets.get(scope="my_scope", key="sql_user"))
.option("password", dbutils.secrets.get(scope="my_scope", key="sql_password"))
.load()
)

df.display()

This approach works out of the box with no init scripts, no driver installation, and no internet access required. It also uses Databricks Secrets for secure credential management.

If you specifically need pyodbc (for example, for executing stored procedures or DDL commands that are not supported via Spark JDBC), then the init script approach above is the way to go.


DEBUGGING TIPS

If you still have issues after updating the init script:

1. Enable cluster log delivery in your cluster configuration so init script logs are persisted.

2. Check the logs at: <cluster-log-path>/<cluster-id>/init_scripts/

3. You can also check logs on a running cluster at: /databricks/init_scripts/

4. To verify the driver is installed correctly from a notebook cell, run:

import subprocess
result = subprocess.run(["odbcinst", "-q", "-d"], capture_output=True, text=True)
print(result.stdout)

5. Test pyodbc connectivity:

import pyodbc
conn = pyodbc.connect(
"DRIVER={ODBC Driver 17 for SQL Server};"
"SERVER=<your-server>;"
"DATABASE=<your-database>;"
"UID=<username>;"
"PWD=<password>"
)
cursor = conn.cursor()
cursor.execute("SELECT 1")
print(cursor.fetchone())
conn.close()


REFERENCES

- Databricks init scripts overview: https://docs.databricks.com/en/init-scripts/index.html
- Cluster-scoped init scripts: https://docs.databricks.com/en/init-scripts/cluster-scoped.html
- Init script logging: https://docs.databricks.com/en/init-scripts/logs.html
- JDBC connectivity for external databases: https://docs.databricks.com/en/connect/external-systems/jdbc.html
- Microsoft ODBC Driver for SQL Server on Linux (includes offline install guidance): https://learn.microsoft.com/en-us/sql/connect/odbc/linux-mac/installing-the-microsoft-odbc-driver-fo...

Hope this gets you unblocked. The EULA acceptance is almost certainly the fix for the hang you are seeing.

* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.