cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Connecting to Azure PostgreSQL from Azure Databricks

kp12
New Contributor II

Hello,

In Databricks there are 2 ways to connect to PostgreSQL, i.e., using JDBC driver or the named connector as mentioned in the document -  https://learn.microsoft.com/en-us/azure/databricks/external-data/postgresql

For JDBC, the driver needs to be installed on the cluster manually, and the named connector is only available from runtime 11.2 onwards.

I wanted to know what are the other differences in the 2 connection methods? is the named connector more performant than JDBC? or is the named connector basically a JDBC driver that comes built-in with 11.2, and hence eliminates the need to install driver manually.

I need to read from and write to Azure PostgreSQL database from Azure Databricks, hence wanted to find out differences from performance perspective.

Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz_Fatma
Community Manager
Community Manager

Hi @kp12, According to the documentation on Azure Databricks external data sources, the named connector for PostgreSQL in Databricks 11.2 and above provides optimized integrations for syncing data with many external data sources, including Azure PostgreSQL databases.

Compared to the built-in JDBC connector, this named connector can bulk insert data into SQL databases, which can outperform row-by-row insertion with 10x to 20x faster performance.

Therefore, using the named connector will likely provide better performance when reading from and writing to an Azure PostgreSQL database in Azure Databricks.

Sources:
https://docs.databricks.com/external-data/postgresql.html
https://docs.databricks.com/external-data/sql-databases-azure.html

View solution in original post

1 REPLY 1

Kaniz_Fatma
Community Manager
Community Manager

Hi @kp12, According to the documentation on Azure Databricks external data sources, the named connector for PostgreSQL in Databricks 11.2 and above provides optimized integrations for syncing data with many external data sources, including Azure PostgreSQL databases.

Compared to the built-in JDBC connector, this named connector can bulk insert data into SQL databases, which can outperform row-by-row insertion with 10x to 20x faster performance.

Therefore, using the named connector will likely provide better performance when reading from and writing to an Azure PostgreSQL database in Azure Databricks.

Sources:
https://docs.databricks.com/external-data/postgresql.html
https://docs.databricks.com/external-data/sql-databases-azure.html

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group