cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Connecting to Azure PostgreSQL from Azure Databricks

kp12
New Contributor II

Hello,

In Databricks there are 2 ways to connect to PostgreSQL, i.e., using JDBC driver or the named connector as mentioned in the document -  https://learn.microsoft.com/en-us/azure/databricks/external-data/postgresql

For JDBC, the driver needs to be installed on the cluster manually, and the named connector is only available from runtime 11.2 onwards.

I wanted to know what are the other differences in the 2 connection methods? is the named connector more performant than JDBC? or is the named connector basically a JDBC driver that comes built-in with 11.2, and hence eliminates the need to install driver manually.

I need to read from and write to Azure PostgreSQL database from Azure Databricks, hence wanted to find out differences from performance perspective.

Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz_Fatma
Community Manager
Community Manager

Hi @kp12, According to the documentation on Azure Databricks external data sources, the named connector for PostgreSQL in Databricks 11.2 and above provides optimized integrations for syncing data with many external data sources, including Azure PostgreSQL databases.

Compared to the built-in JDBC connector, this named connector can bulk insert data into SQL databases, which can outperform row-by-row insertion with 10x to 20x faster performance.

Therefore, using the named connector will likely provide better performance when reading from and writing to an Azure PostgreSQL database in Azure Databricks.

Sources:
https://docs.databricks.com/external-data/postgresql.html
https://docs.databricks.com/external-data/sql-databases-azure.html

View solution in original post

1 REPLY 1

Kaniz_Fatma
Community Manager
Community Manager

Hi @kp12, According to the documentation on Azure Databricks external data sources, the named connector for PostgreSQL in Databricks 11.2 and above provides optimized integrations for syncing data with many external data sources, including Azure PostgreSQL databases.

Compared to the built-in JDBC connector, this named connector can bulk insert data into SQL databases, which can outperform row-by-row insertion with 10x to 20x faster performance.

Therefore, using the named connector will likely provide better performance when reading from and writing to an Azure PostgreSQL database in Azure Databricks.

Sources:
https://docs.databricks.com/external-data/postgresql.html
https://docs.databricks.com/external-data/sql-databases-azure.html

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!