cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Can I get more details on the performance differences between pyodbc and SQL Connector for Python?

Travis84
New Contributor II

This article (Connect Python and pyodbc to Databricks | Databricks on AWS) states the following

"However pyodbc may have better performance when fetching queries results above 10 MB."

This is a bit vague. The word "may" implies "maybe not". Also, "better performance" is not quantitative. How much better? Are there any benchmarking studies? I have not been able to find out more information.

4 REPLIES 4

SP_6721
Contributor III

Hi @Travis84 ,

The documentation sounds vague but real-world performance depends on many factors. For clarity, the best approach is to run a simple test by executing the same large query with both pyodbc and the Databricks SQL Connector in your environment.

Coffee77
New Contributor II

I can't give you a comparison but at least in my case, Spark SQL Connector over SQL Server is behaving pretty fine when retrieving moderate amount of rows from SQL tables. As said in previous commands, it depends on multiple factors, not only driver but database design as well. In any case, I started using that connector as needed a OLTP system integrated with Databricks Lakehouse. So, I created a set of functions to interact with SQL Server and at least until now, good performance. With new addition of Databricks Lakebase over PostgreSQL, maybe I need to upgrade...opssss Saying this because perhaps this new feature can be a good fit for you as well.

https://www.youtube.com/@CafeConData

WiliamRosa
New Contributor II

Hi @Travis84,

 Hi,
I came across an article that might help you, which makes the following comparison:
A blog on high-bandwidth connections using Databricksโ€™ Cloud Fetch optimization (leveraging parallel data transfer via pre-signed URLs) reported up to 12ร— faster extract throughput for very large datasets (~3.4 GB) when using ODBC-based tooling.
https://www.databricks.com/blog/2021/08/11/how-we-achieved-high-bandwidth-connectivity-with-bi-tools...

Wiliam Rosa
Data Engineer | Machine Learning Engineer
LinkedIn: linkedin.com/in/wiliamrosa

Travis84
New Contributor II

Interesting article but it seems that Cloud Fetch is supported on both odbc and sql connector

Driver capability settings for the Databricks ODBC Driver | Databricks on AWS

Databricks SQL Connector for Python | Databricks on AWS (see section getting started)

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now