06-07-2022 08:33 AM
I know how to do it with spark, and read/write tables (like https://docs.microsoft.com/en-gb/azure/databricks/data/data-sources/sql-databases#python-example )
But this time, I need to only update a field of a specific row in a table. I do not think I can do that with spark jdbc (mode for append or overwrite)
Then, I tried to use jaydebeapi and pymssql, but no success.
jaydebeapi would: Class com.microsoft.sqlserver.jdbc.SQLServerDriver is not found
Also tried to use pyodbc, but failed to install mssql odbc driver.
what is the best practice for that usecase?
06-08-2022 05:20 AM
Hi @Philippe CRAVE, Databricks Runtime contains JDBC drivers for Microsoft SQL Server and Azure SQL Database. See the Databricks runtime release notes for the complete list of JDBC libraries included in Databricks Runtime.
This article covers how to use the DataFrame API to connect to SQL databases using JDBC and how to control the parallelism of reads through the JDBC interface. This article provides detailed examples using the Scala API, with abbreviated Python and Spark SQL examples at the end. For all of the supported arguments for connecting to SQL databases using JDBC, see JDBC To Other Databases.
06-09-2022 08:26 AM
thanks for the link.
I am maybe wrong, but they describe how to connect with spark. They do not provide a connection engine that we could use directly (like with pyodbc) or an engine that we could use in pandas, for example.
06-10-2022 05:24 AM
Hi @Philippe CRAVE, This tutorial will guide how to read and write data to/from Azure SQL Database using pandas in Databricks. Please let me know if this helps.
06-10-2022 07:36 AM
thanks!
but here again, it goes through the spark interface.
Therefore, comes with spark limits. For example, when write, we can "overwrite" or "append".
What if we want to, only, update an existing row?
Could load all the table, update it in spark, and write it back. But if the table is large, and we need to do that in a very short time?
06-14-2022 03:01 AM
Hi @Philippe CRAVE, Can you please share the script you used for jaydebeapi and pymssql?
06-22-2022 08:28 AM
sure!
pip install jaydebeapi pymssql
import jaydebeapi
import pymssql
pymssql.connect(server='xxxxxxxxxxxxxxxxx.database.windows.net',
user='sqladmin@xxxxxxxxxxxxxxxxx',
password='1234',
database="yyyyyyyyyyyyyyyyy",
port='1433',
)
pymssql.connect(server='xxxxxxxxxxxxxxxxx.database.windows.net',
user='sqladmin',
password='1234',
database="yyyyyyyyyyyyyyyyy",
port='1433',
)
jaydebeapi.connect('com.microsoft.sqlserver.jdbc.SQLServerDriver',
'jdbc:sqlserver://xxxxxxxxxxxxxxxxx.database.windows.net:1433;database=yyyyyyyyyyyyyyyyy',
['sqladmin', '1234'])
bellow, outputs:
OperationalError: (20009, b'DB-Lib error message 20009, severity 9:\nUnable to connect: Adaptive Server is unavailable or does not exist (xxxxxxxxxxx.database.windows.net)\nNet-Lib error during Connection refused (111)\nDB-Lib error message 20009, severity 9:\nUnable to connect: Adaptive Server is unavailable or does not exist (xxxxxxxxxxxxx.database.windows.net)\nNet-Lib error during Connection refused (111)\n')
same for the second call.
and last, for jaydebeapi
TypeError: Class com.microsoft.sqlserver.jdbc.SQLServerDriver is not found
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.
Click here to register and join today!
Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.