Databricks Workflow dbt-core job failure with Connection aborted

ChsAIkrishna
Contributor

When we are using dbt-core task on databricks workflow, each 100 workflow executions one job is failing with below reason after the reboot it works well what would be the permanent remediation ? 

('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

Walter_C
Databricks Employee
Databricks Employee

Here are some steps you can take to address this issue:

  1. Update dbt-databricks Version: Ensure that you are using the latest version of dbt-databricks. The issue has been addressed in version 1.7.14, which includes changes to connection management and logging improvements.

  2. Adjust Connection Settings: Modify the connection settings in your dbt profile to reduce the likelihood of idle connections being closed. Specifically, you can set the connect_max_idle parameter to a lower value, such as 60 seconds. This setting ensures that connections are not idle for too long, which can help prevent them from being closed unexpectedly.

    Example configuration:

    connection_parameters:
      connect_max_idle: 60
  3. Increase Connection Retries: Increase the number of connection retries and the timeout settings to provide more resilience against transient connection issues.

    Example configuration:

connect_retries: 6
connect_timeout: 600
_socket_timeout: 1200

View solution in original post

ChsAIkrishna
Contributor

@Walter_C  Kudo's to you, Thank you very much, we placed the "connect retries" lets see. 
Ref : https://docs.getdbt.com/docs/core/connect-data-platform/databricks-setup#additional-parameters