cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Connection Databricks Postgresql

nadia
New Contributor II

I use Databricks and I try to connect to posgresql via the following code

"jdbcHostname = "xxxxxxx"

jdbcDatabase = "xxxxxxxxxxxx"

jdbcPort = "5432"

username = "xxxxxxx"

password = "xxxxxxxx"

jdbcUrl = "jdbc:postgresql://{0}:{1}/{2}".format(jdbcHostname, jdbcPort, jdbcDatabase)

connectionProperties = {

      "user" : username,

      "password" : password,

      "driver" : "org.postgresql.Driver"

    }

df = spark.read.jdbc(url=jdbcUrl, table= "xxxxxxxxx" , properties=connectionProperties)"

I try to read a table that is 28 million rows and here is the error message;

"SparkException: Job aborted due to stage failure: Task 0 in stage 3.0 failed 4 times, most recent failure: Lost task 0.3 in stage 3.0 (TID 6) (10.139.64.5 executor 4): ExecutorLostFailure (executor 4 exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 150527 ms"

Could you help me please

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions

Prabakar
Databricks Employee
Databricks Employee

hi @Boumaza nadia​ Please check the Ganglia metrics for the cluster. This could be a scalability issue where cluster is overloading. This can happen due to a large partition not fitting into the given executor's memory. To fix this we recommend bumping up the worker node type. Switch to a bigger worker node instance to mitigate the issue.

View solution in original post

2 REPLIES 2

Prabakar
Databricks Employee
Databricks Employee

hi @Boumaza nadia​ Please check the Ganglia metrics for the cluster. This could be a scalability issue where cluster is overloading. This can happen due to a large partition not fitting into the given executor's memory. To fix this we recommend bumping up the worker node type. Switch to a bigger worker node instance to mitigate the issue.

santhosh11
New Contributor II

Can you tell me how you are able to connect postgres database from Databricks . Do we have to whitelist ips in postgres?

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now