We run a dbx job that uses a jdbc connection to three (SQL Server) on-prem databases. Very occasionally a connection will go to the wrong server.
In the job log we see the error "com.microsoft.sqlserver.jdbc.SQLServerException: Login failed for user 'xxxxxx". We also log the used username and ip address (to rule out a DNS problem we use ip addresses in the connection string) so we are certain about what the connection is trying to do.
On the SQL Server side, we see the error "Login failed for user 'xxxxxx'." But, on a different server.
The code we have to execute queries is:
df = (spark.read
ICT has checked the ip traffic logs and verified what happens (though cannot explain it either).
Lastly, to give the word "occasional" more meaning... Our job runs every hour and performs 50+ queries per run. We have this problem once every 5-ish runs (but unpredictable). We don't see any pattern in the timing.
@Michel Smits :
It's hard to say for certain what could be causing the issue, but it's possible that there may be a race condition or some kind of timing issue that is causing the connection to go to the wrong server. Here are a few things you could try to troubleshoot and resolve the issue:
Thanks for the lengthy reply. We also think it's some kind of race condition.
As to your points...
We're not sure how to influence 1/2 since they are part of the Databricks environment.
Also, we're not convinced how 4/5/6 will help since the problem is that the request goes to the wrong server.
That leaves 1. We found there is an alternative driver and we are trying this to see if it helps.
df = (spark.read .format('sqlserver') .option('host', ip) .option('user', connection.user) .option('password', pw) .option('database', connection.database) .option('sslProtocol', 'TLSv1.2') .option('trustServerCertificate', 'true') .option('encrypt', 'true') .option(query_or_table, statement) )
We need to wait to see if this resolves the problem.
Unfortunately this change did not solve the problem. Looking at the exception and stack trace it looks to be the exact same code, so:
are actually the same.
The "solution" we have now, is to basically connect again when we have this exception. We have been doing this for a few weeks now and can confirm that whenever this error happens, the second connection attempt always succeeds.
Hi @Michel Smits
Thank you for posting your question in our community! We are happy to assist you.
To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?
This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance!
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.
Click here to register and join today!