cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How to update a SQL Server Table using JDBC or something else in Python/Pyspark ?

berserkersap
Contributor

I need to update a SQL Server Table from Databricks notebook. Right now, I am trying to do this using JDBC. However, it seems we can only append or overwrite the table using the JDBC Connection.

Query databases using JDBC - Azure Databricks | Microsoft Learn

I wanted to try pyodbc and used "pip install pyodbc"

but when I tried to connect

 

pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER='+server+';DATABASE='+database+';ENCRYPT=yes;UID='+username+';PWD='+ password)

 

I got the error message

berserkersap_0-1688032497010.png

('01000', "[01000] [unixODBC][Driver Manager]Can't open lib 'ODBC Driver 17 for SQL Server' : file not found (0) (SQLDriverConnect)")

Please suggest me any way to do this.

Please note that I do not want to use Scala. Also, I also do not want to install a driver or something of that sort on the cluster.

If installing is inevitable, please list down the steps I need to do for that along with the necessary precautions.

1 ACCEPTED SOLUTION

Accepted Solutions

diego_poggioli
Contributor

Hi @berserkersap thanks for your answer. I was able to solve the problem in 2 ways:

1) downgrading the Runtime version to 12.2 and then the installer of msodbcsql17 no longer failed (with the error Can't open lib 'ODBC Driver 17 for SQL Server' : file not found)

2) use a Java wrapper in this way:

url=f"jdbc:sqlserver://{server};encrypt=true;user={username};password={password};databaseName={database};"

sql_driver_manager=spark._sc._gateway.jvm.java.sql.DriverManager
con = sql_driver_manager.getConnection(url)

con.prepareCall("INSERT INTO... ").execute()
con.close()

 

View solution in original post

4 REPLIES 4

diego_poggioli
Contributor

Hi @berserkersap did you manage to find a solution for this? I'm facing the same problem.

Thanks

Diego

Hello @diego_poggioli , I wasn't able to find a clean solution.

One thing for sure is that we cannot use UPDATE, MERGE, stored procedures or any such statements using JDBC. We can only query, write (or overwrite) using it.

Since this is the case, I could only use ODBC. We can use pyodbc to make a connection and execute commands. However, we need ODBC driver installed for the library to work. Since, I cannot install directly on cluster or even keep init scripts (no permission for this was given 😢), I could only install inside a notebook (session) using the following code.

%sh
curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
curl https://packages.microsoft.com/config/ubuntu/16.04/prod.list > /etc/apt/sources.list.d/mssql-release.list
sudo apt-get update
sudo ACCEPT_EULA=Y apt-get -q -y install msodbcsql17

Replace the required versions of linux and msodbc in the code. However, remember that you need to execute this command every time your cluster gets restarted or the state gets cleared.

After using this code you can use pyodbc to execute the statements. (Remember to commit() after transactions like UPDATE and close the connection after using to reduce the errors)

Hope this helps 🙂

Refer:

https://stackoverflow.com/questions/61022848/do-you-know-how-to-install-the-odbc-driver-17-for-sql-s...

https://learn.microsoft.com/en-us/sql/connect/python/pyodbc/step-3-proof-of-concept-connecting-to-sq...

https://docs.databricks.com/en/integrations/jdbc-odbc-bi.html

When you write your PySpark script make sure to set the option of 'truncate' to 'true' 

diego_poggioli
Contributor

Hi @berserkersap thanks for your answer. I was able to solve the problem in 2 ways:

1) downgrading the Runtime version to 12.2 and then the installer of msodbcsql17 no longer failed (with the error Can't open lib 'ODBC Driver 17 for SQL Server' : file not found)

2) use a Java wrapper in this way:

url=f"jdbc:sqlserver://{server};encrypt=true;user={username};password={password};databaseName={database};"

sql_driver_manager=spark._sc._gateway.jvm.java.sql.DriverManager
con = sql_driver_manager.getConnection(url)

con.prepareCall("INSERT INTO... ").execute()
con.close()

 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.