cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

We are trying to connect to AWS RDS MySQL instance from DBX with PySpark using JDBC

Gaurav_Lokhande
New Contributor II
We are trying to connect to AWS RDS MySQL instance from DBX with PySpark using JDBC:
 
jdbc_df = (spark.read.format("jdbc").options(url=f"jdbc:mysql://{creds['host']}:{creds['port']}/{creds['database']}", driver="com.mysql.cj.jdbc.Driver", dbtable="(SELECT * FROM table LIMIT 10) AS t", user=creds["user"], password=creds["password"]).load())
 
Any pointers in resolving the issues will be really appreciated.
 
Note: All the fields in the creds dictionary are correct.
 
We are running into the following error:
 
Py4JJavaError: An error occurred while calling o436.load.
: com.mysql.cj.jdbc.exceptions.CommunicationsException: Communications link failure
 
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
at com.mysql.cj.jdbc.exceptions.SQLError.createCommunicationsException(SQLError.java:165)
at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:55)
at com.mysql.cj.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:861)
at com.mysql.cj.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:449)
at com.mysql.cj.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:234)
at com.mysql.cj.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:180)
at org.apache.spark.sql.execution.datasources.jdbc.connection.BasicConnectionProvider.getConnection(BasicConnectionProvider.scala:50)
at org.apache.spark.sql.execution.datasources.jdbc.connection.ConnectionProviderBase.create(ConnectionProvider.scala:102)
at org.apache.spark.sql.jdbc.JdbcDialect.$anonfun$createConnectionFactory$1(JdbcDialects.scala:211)
at org.apache.spark.sql.jdbc.JdbcDialect.$anonfun$createConnectionFactory$1$adapted(JdbcDialects.scala:207)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.getQueryOutputSchema(JDBCRDD.scala:73)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:68)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:243)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:44)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:398)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:394)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:350)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:350)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:236)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:397)
at py4j.Gateway.invoke(Gateway.java:306)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:199)
at py4j.ClientServerConnection.run(ClientServerConnection.java:119)
at java.lang.Thread.run(Thread.java:750)
Caused by: com.mysql.cj.exceptions.CJCommunicationsException: Communications link failure
 
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:52)
at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:95)
at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:140)
at com.mysql.cj.exceptions.ExceptionFactory.createCommunicationsException(ExceptionFactory.java:156)
at com.mysql.cj.protocol.a.NativeSocketConnection.connect(NativeSocketConnection.java:79)
at com.mysql.cj.NativeSession.connect(NativeSession.java:139)
at com.mysql.cj.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:980)
at com.mysql.cj.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:851)
... 29 more
Caused by: java.net.SocketTimeoutException: connect timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:613)
at com.mysql.cj.protocol.StandardSocketFactory.connect(StandardSocketFactory.java:144)
at com.mysql.cj.protocol.a.NativeSocketConnection.connect(NativeSocketConnection.java:53)

 

7 REPLIES 7

VZLA
Databricks Employee
Databricks Employee

Could you first verify, from a notebook, that network connectivity is properly working?

%sh nc -vz <jdbcHostname> <jdbcPort>

The com.mysql.cj.exceptions.CJCommunicationsException: Communications link failure, typically indicates a network connectivity issue between your Databricks cluster and the MySQL database hosted on AWS RDS. The specific error java.net.SocketTimeoutException: connect timed out suggests that the connection attempt to the MySQL server is timing out. Other related aspects to be checked: RDS security groups, firewalls rules and settings (inbound/outbound), VPC peering or PrivateLink, etc. 

"The way the AWS MySQL RDS replica is whitelisted for Databricks IP addresses in IP addresses and domains for Databricks services and assets. Our AWS account is in us-east-1, and hence the Ip addresses whitelisted are for Databricks, specifically the range 3.237.73.224/28 Are these the correct IP addresses for Databricks that need to be whitelisted?"

Gaurav_Lokhande
New Contributor II

"The way the AWS MySQL RDS replica is whitelisted for Databricks IP addresses in IP addresses and domains for Databricks services and assets. Our AWS account is in us-east-1, and hence the IP addresses whitelisted are for Databricks, specifically the range 3.237.73.224/28. Are these the correct IP addresses for Databricks that need to be whitelisted?"

VZLA
Databricks Employee
Databricks Employee

Yes, that seems correct for the inbound traffic at least:

  • Control plane services, including webapp: nvirginia.cloud.databricks.com, 3.237.73.224/28
  • SCC relay: tunnel.us-east-1.cloud.databricks.com
  • SCC relay for PrivateLink: tunnel.privatelink.us-east-1.cloud.databricks.com

Same document can be referred for outbound.

Is the netcat test going through?

Some additional tests:

  • Security Groups and Firewalls:

    • Verify that the security group associated with your MySQL RDS instance allows inbound traffic on port 3306 (the default MySQL port) from the IP addresses or CIDR blocks used by your Databricks cluster.
    • Ensure there are no firewall rules blocking the connection.
  • VPC Peering:

    • If your Databricks workspace is in a different VPC than your RDS instance, ensure that VPC peering is correctly configured between the two VPCs.
    • Check that the route tables and network ACLs are set up to allow traffic between the VPCs.

Hi Community,
still, we are getting Py4JJavaError: An error occurred while calling o476.jdbc. : com.mysql.cj.jdbc.exceptions.CommunicationsException: Communications link failure
when we are running below in notebook - 
jdbc_df = (spark.read.format("jdbc").options(url=f"jdbc:mysql://{creds['host']}:{creds['port']}/{creds['database']}", driver="com.mysql.cj.jdbc.Driver", dbtable="(SELECT * FROM table LIMIT 10) AS t", user=creds["user"], password=creds["password"]).load())
 
Any pointers in resolving the issues will be really appreciated.
Note: All the fields in the creds dictionary are correct.
The 3.237.73.224/28 range was already whitelisted.
There are no outbound restrictions.
 
 

VZLA
Databricks Employee
Databricks Employee

@Gaurav_Lokhande You're receiving an "The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server."

So you have a client (Driver) and a Server(Mysql). The message states it has successfully sent a packet to the Server, but it has not received anything from the Mysql, can you please confirm whether communication in both ways has been configured and allowed? e..g: In Databricks 3.237.73.224/28 outbound/inbound allow rule to/from Mysql:3306, and in Mysql inbound/outbound allow rule to/from 3.237.73.224/28 ?

arjun_kr
Databricks Employee
Databricks Employee

@Gaurav_Lokhande  With Spark JDBC usage, connectivity happens between your Databricks VPC (in your AWS account) and RDS VPC, assuming you are using non-serverless clusters. You may need to ensure this connectivity works (like by peering).

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group