The error occurs because the Microsoft ODBC Driver 17 for SQL Server is missing on your Azure Databricks cluster. Here's how to fix it:
Steps to Resolve
Step 1: Create an Init Script to Install ODBC Driver
1. Create a file named `odbc-install.sh` with the following content:
```bash
#!/bin/bash
curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
curl https://packages.microsoft.com/config/ubuntu/22.04/prod.list > /etc/apt/sources.list.d/mssql-release.list
apt-get update
ACCEPT_EULA=Y apt-get -y install msodbcsql17
apt-get -y install unixodbc-dev
```
2. Save this script in your Databricks workspace.
Step 2: Attach Init Script to Your Cluster
1. Go to your cluster configuration in Databricks.
2. Under "Advanced" settings, find the "Init Scripts" section.
3. Add the script by selecting "Workspace" as the source and providing the path (e.g., `/Workspace/odbc-install.sh`).
4. Restart your cluster.
---
### Step 3: Update Connection String in Your Code
Modify the connection string in your `connect_to_database` function to explicitly use the installed driver:
```python
conn = pyodbc.connect(
f'DRIVER={{ODBC Driver 17 for SQL Server}};SERVER={server};DATABASE={database};UID={username};PWD={password}'
)
```
---
Step 4: Test the Setup
Run this snippet in a Databricks notebook to confirm the driver is installed:
```python
import pyodbc
# List available drivers
print(pyodbc.drivers())
```
Ensure `ODBC Driver 17 for SQL Server` appears in the output.
---
Alternative Approach
If issues persist, consider using JDBC instead of ODBC for database connections:
```python
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("Certificate Data").getOrCreate()
df = spark.read.format("jdbc").option("url", f"jdbc:sqlserver://{server}:1433;databaseName={database}") \
.option("dbtable", "SSLCertificateData") \
.option("user", username) \
.option("password", password).load()
```
Summary
- Install ODBC Driver 17 using an init script.
- Update your connection string to use `ODBC Driver 17 for SQL Server`.
- Test installation with `pyodbc.drivers()` or switch to JDBC for simpler integration.