06-29-2023 02:58 AM
I need to update a SQL Server Table from Databricks notebook. Right now, I am trying to do this using JDBC. However, it seems we can only append or overwrite the table using the JDBC Connection.
Query databases using JDBC - Azure Databricks | Microsoft Learn
I wanted to try pyodbc and used "pip install pyodbc"
but when I tried to connect
pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER='+server+';DATABASE='+database+';ENCRYPT=yes;UID='+username+';PWD='+ password)
I got the error message
('01000', "[01000] [unixODBC][Driver Manager]Can't open lib 'ODBC Driver 17 for SQL Server' : file not found (0) (SQLDriverConnect)")
Please suggest me any way to do this.
Please note that I do not want to use Scala. Also, I also do not want to install a driver or something of that sort on the cluster.
If installing is inevitable, please list down the steps I need to do for that along with the necessary precautions.
10-13-2023 05:03 AM
Hi @berserkersap thanks for your answer. I was able to solve the problem in 2 ways:
1) downgrading the Runtime version to 12.2 and then the installer of msodbcsql17 no longer failed (with the error Can't open lib 'ODBC Driver 17 for SQL Server' : file not found)
2) use a Java wrapper in this way:
url=f"jdbc:sqlserver://{server};encrypt=true;user={username};password={password};databaseName={database};"
sql_driver_manager=spark._sc._gateway.jvm.java.sql.DriverManager
con = sql_driver_manager.getConnection(url)
con.prepareCall("INSERT INTO... ").execute()
con.close()
10-12-2023 12:18 AM
Hi @berserkersap did you manage to find a solution for this? I'm facing the same problem.
Thanks
Diego
10-12-2023 08:49 AM
Hello @diego_poggioli , I wasn't able to find a clean solution.
One thing for sure is that we cannot use UPDATE, MERGE, stored procedures or any such statements using JDBC. We can only query, write (or overwrite) using it.
Since this is the case, I could only use ODBC. We can use pyodbc to make a connection and execute commands. However, we need ODBC driver installed for the library to work. Since, I cannot install directly on cluster or even keep init scripts (no permission for this was given 😢), I could only install inside a notebook (session) using the following code.
%sh
curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
curl https://packages.microsoft.com/config/ubuntu/16.04/prod.list > /etc/apt/sources.list.d/mssql-release.list
sudo apt-get update
sudo ACCEPT_EULA=Y apt-get -q -y install msodbcsql17
Replace the required versions of linux and msodbc in the code. However, remember that you need to execute this command every time your cluster gets restarted or the state gets cleared.
After using this code you can use pyodbc to execute the statements. (Remember to commit() after transactions like UPDATE and close the connection after using to reduce the errors)
Hope this helps 🙂
Refer:
https://docs.databricks.com/en/integrations/jdbc-odbc-bi.html
10-20-2023 02:51 AM
10-13-2023 05:03 AM
Hi @berserkersap thanks for your answer. I was able to solve the problem in 2 ways:
1) downgrading the Runtime version to 12.2 and then the installer of msodbcsql17 no longer failed (with the error Can't open lib 'ODBC Driver 17 for SQL Server' : file not found)
2) use a Java wrapper in this way:
url=f"jdbc:sqlserver://{server};encrypt=true;user={username};password={password};databaseName={database};"
sql_driver_manager=spark._sc._gateway.jvm.java.sql.DriverManager
con = sql_driver_manager.getConnection(url)
con.prepareCall("INSERT INTO... ").execute()
con.close()
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group