Can I get more details on the performance differences between pyodbc and SQL Connector for Python?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-26-2025 12:24 AM
This article (Connect Python and pyodbc to Databricks | Databricks on AWS) states the following
"However pyodbc may have better performance when fetching queries results above 10 MB."
This is a bit vague. The word "may" implies "maybe not". Also, "better performance" is not quantitative. How much better? Are there any benchmarking studies? I have not been able to find out more information.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-26-2025 07:00 AM
Hi @Travis84 ,
The documentation sounds vague but real-world performance depends on many factors. For clarity, the best approach is to run a simple test by executing the same large query with both pyodbc and the Databricks SQL Connector in your environment.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-27-2025 01:57 AM
I can't give you a comparison but at least in my case, Spark SQL Connector over SQL Server is behaving pretty fine when retrieving moderate amount of rows from SQL tables. As said in previous commands, it depends on multiple factors, not only driver but database design as well. In any case, I started using that connector as needed a OLTP system integrated with Databricks Lakehouse. So, I created a set of functions to interact with SQL Server and at least until now, good performance. With new addition of Databricks Lakebase over PostgreSQL, maybe I need to upgrade...opssss Saying this because perhaps this new feature can be a good fit for you as well.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-27-2025 04:26 AM
Hi @Travis84,
Hi,
I came across an article that might help you, which makes the following comparison:
A blog on high-bandwidth connections using Databricks’ Cloud Fetch optimization (leveraging parallel data transfer via pre-signed URLs) reported up to 12× faster extract throughput for very large datasets (~3.4 GB) when using ODBC-based tooling.
https://www.databricks.com/blog/2021/08/11/how-we-achieved-high-bandwidth-connectivity-with-bi-tools...
Data Engineer | Machine Learning Engineer
LinkedIn: linkedin.com/in/wiliamrosa
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-27-2025 04:59 AM
Interesting article but it seems that Cloud Fetch is supported on both odbc and sql connector
Driver capability settings for the Databricks ODBC Driver | Databricks on AWS
Databricks SQL Connector for Python | Databricks on AWS (see section getting started)