Databricks Community

fk5 · ‎07-04-2024

Hi, what is the best way to connect to a SQL Server on LTS 14.3?

I've been trying to setup a connection to an SQL Server as referenced here. However both using both "sqlserver" and "jdbc" as format has resulted in an exception when using display as spark will send a LIMIT clause to the SQL Server. Reading the data and counting seems to work fine.

df_jdbc = (spark.read.format('jdbc') \
            .option('driver', 'com.microsoft.sqlserver.jdbc.SQLServerDriver') \
            .option('url', jdbc_url) \
            .option('user', sql_server_user) \
            .option('password', sql_server_password) \
            .option("dbtable", 'dbo.TestTable') \
            .load())

df_jdbc.display()
# com.microsoft.sqlserver.jdbc.SQLServerException: Incorrect syntax near '10001'.

df = (spark.read.format('sqlserver') \
            .option('host', sql_server_host) \
            .option('port', sql_server_port) \
            .option('database', sql_server_database) \
            .option('user', sql_server_user) \
            .option('password', sql_server_password) \
            .option("dbtable", 'dbo.TestTable')
            .load())

df.display()
# com.microsoft.sqlserver.jdbc.SQLServerException: Incorrect syntax near '10001'.

# Query send by spark in both queries: SELECT TestColumn FROM dbo.TestTable     LIMIT 10001

szymon_dybczak · ‎07-04-2024

Bit weird, I've tested option with format('sqlserver') and I didn't notice any problems on LTS 14.3. Did you check what happens if you use databricks specific display command, like so:

display(df)

fk5 · ‎07-04-2024

In this instance neither df.display() or display(df) does work because in the end both will return the same sql to SQL Server. However I noticed it seems to be an error in the config of our cluster as it does work fine on a new compute.

Probably has to do with the custom dialect we are using to map TimestampType to datetime2.