โ09-19-2023 12:52 PM
I have added a connection to a MySql database in Azure, and I have created a foreign catalog in Databricks. But when I go to query the database I get the following error;
Connections using insecure transport are prohibited while --require_secure_transport=ON
I appreciate that I can turn off this setting on my database, but I would rather do it properly and configure it from the Databricks side. Is it possible to configure Databricks so it uses secure transport on external connections?
โ09-19-2023 01:38 PM
Hi @TimB , Yes, it is possible to configure Databricks to use secure transport on external connections. You will need to include the SSL properties in your connection options.
The SSL properties usually include:
- sslmode
: The SSL mode to use when connecting to the target. To enforce SSL, set this to 'require'.
- sslcert
: The path to the SSL client certificate file.
- sslkey
: The path to the SSL private key file.
- sslrootcert
: The path to the SSL root certificate file.
โ09-20-2023 01:10 AM
โ09-20-2023 01:35 AM
Hi @TimB ,
Yes, it is possible to configure Databricks to use secure transport on external connections. You can set the trustServerCertificate
option to true
in the Spark configuration. This option allows the driver to trust the server certificate for SSL connections, instead of validating it against the certificate authority.
Here's an example of how to configure a JDBC connection to use SSL in Databricks using the Spark configuration:
jdbcUsername = "myusername"
jdbcPassword = "mypassword"
jdbcHostname = "myhostname"
jdbcPort = "myport"
jdbcDatabase = "mydatabase"
jdbcUrl = "jdbc:mysql://{0}:{1}/{2}?useSSL=true&requireSSL=true&verifyServerCertificate=false&trustServerCertificate=true".format(jdbcHostname, jdbcPort, jdbcDatabase)
connectionProperties = {
"user": jdbcUsername,
"password": jdbcPassword,
"driver": "com.mysql.jdbc.Driver",
}
df = spark.read.jdbc(url=jdbcUrl, table="mytable", properties=connectionProperties)
In the JDBC URL, useSSL=true
enables SSL encryption, and requireSSL=true
specifies that SSL is required. Setting verifyServerCertificate=false
is needed in some specific cases.
Adding trustServerCertificate=true
allows the driver to trust the server certificate and make an SSL connection.
Note: If your database server uses a custom certificate authority (CA), you need to include the CA certificate in the Databricks cluster's Java truststore to enable the driver to trust the certificate. You can use the spark.executor.extraJavaOptions
and/or spark.driver.extraJavaOptions
configuration options to add the CA certificate to the Java truststore.
โ09-20-2023 01:39 AM
Hi Kaniz,
That is the way I work when I normally query data from the azure SQLdatabase in DBX. But I am trying to use the new Lakehouse Federation from within DBX itself which should abstract away those connections properties.
https://learn.microsoft.com/en-us/azure/databricks/query-federation/mysql
2 weeks ago
Hi,
Is there an update on this? I have the same issue where I try using Lakehouse federation but I cannot input the additional connection information or manually edit the url created when defining a "connection" on the databrick UI required for Lakehouse federation.
a week ago
I was trying to connect to our database (Azure MySQL) from DBX, but we wanted require_secure_transport to be set to ON, and we didn't want to turn it off. We ended up moving DBX within a VNET and setting up a private link to get around this.
a week ago
Thanks Tim. That seems like the way to go. You prompted me to find this section of the docs: Networking recommendations for Lakehouse Federation - Azure Databricks | Microsoft Learn which mentions exactly what you just said.
On a side note: If you have set up databricks serverless compute to access your Azure mySQL? Did you do the same thing with the serverless compute?
a week ago
I was unable to use serverless compute with Azure MySQL. I did have a meeting with tech support at databricks and the conclusion at the time was - if your Azure MySql is not publically accessible, you cannot connect to it from serverless compute.
I'm mainly using serverless compute to access Dashboards, so I'm using jobs in the background to populate delta tables so the serverless compute can access those instead as a work around.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group