Databricks Community

berserkersap · ‎06-29-2023

I need to update a SQL Server Table from Databricks notebook. Right now, I am trying to do this using JDBC. However, it seems we can only append or overwrite the table using the JDBC Connection.

Query databases using JDBC - Azure Databricks | Microsoft Learn

I wanted to try pyodbc and used "pip install pyodbc"

but when I tried to connect

pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER='+server+';DATABASE='+database+';ENCRYPT=yes;UID='+username+';PWD='+ password)

I got the error message

('01000', "[01000] [unixODBC][Driver Manager]Can't open lib 'ODBC Driver 17 for SQL Server' : file not found (0) (SQLDriverConnect)")

Please suggest me any way to do this.

Please note that I do not want to use Scala. Also, I also do not want to install a driver or something of that sort on the cluster.

If installing is inevitable, please list down the steps I need to do for that along with the necessary precautions.

diego_poggioli · ‎10-13-2023

Hi @berserkersap thanks for your answer. I was able to solve the problem in 2 ways:

1) downgrading the Runtime version to 12.2 and then the installer of msodbcsql17 no longer failed (with the error Can't open lib 'ODBC Driver 17 for SQL Server' : file not found)

2) use a Java wrapper in this way:

url=f"jdbc:sqlserver://{server};encrypt=true;user={username};password={password};databaseName={database};"

sql_driver_manager=spark._sc._gateway.jvm.java.sql.DriverManager
con = sql_driver_manager.getConnection(url)

con.prepareCall("INSERT INTO... ").execute()
con.close()

View solution in original post

diego_poggioli · ‎10-12-2023

Hi @berserkersap did you manage to find a solution for this? I'm facing the same problem.

Thanks

Diego

berserkersap · ‎10-12-2023

Hello @diego_poggioli , I wasn't able to find a clean solution.

One thing for sure is that we cannot use UPDATE, MERGE, stored procedures or any such statements using JDBC. We can only query, write (or overwrite) using it.

Since this is the case, I could only use ODBC. We can use pyodbc to make a connection and execute commands. However, we need ODBC driver installed for the library to work. Since, I cannot install directly on cluster or even keep init scripts (no permission for this was given 😢), I could only install inside a notebook (session) using the following code.

%sh
curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
curl https://packages.microsoft.com/config/ubuntu/16.04/prod.list > /etc/apt/sources.list.d/mssql-release.list
sudo apt-get update
sudo ACCEPT_EULA=Y apt-get -q -y install msodbcsql17

Replace the required versions of linux and msodbc in the code. However, remember that you need to execute this command every time your cluster gets restarted or the state gets cleared.

After using this code you can use pyodbc to execute the statements. (Remember to commit() after transactions like UPDATE and close the connection after using to reduce the errors)

Hope this helps 🙂

Refer:

https://stackoverflow.com/questions/61022848/do-you-know-how-to-install-the-odbc-driver-17-for-sql-s...

https://learn.microsoft.com/en-us/sql/connect/python/pyodbc/step-3-proof-of-concept-connecting-to-sq...

https://docs.databricks.com/en/integrations/jdbc-odbc-bi.html

sc_great · ‎10-20-2023

When you write your PySpark script make sure to set the option of 'truncate' to 'true'

diego_poggioli · ‎10-13-2023

Hi @berserkersap thanks for your answer. I was able to solve the problem in 2 ways:

1) downgrading the Runtime version to 12.2 and then the installer of msodbcsql17 no longer failed (with the error Can't open lib 'ODBC Driver 17 for SQL Server' : file not found)

2) use a Java wrapper in this way:

url=f"jdbc:sqlserver://{server};encrypt=true;user={username};password={password};databaseName={database};"

sql_driver_manager=spark._sc._gateway.jvm.java.sql.DriverManager
con = sql_driver_manager.getConnection(url)

con.prepareCall("INSERT INTO... ").execute()
con.close()