- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-10-2024 06:19 AM - edited 10-10-2024 06:27 AM
I am confused as to the differences between various python libraries for databricks: especially with regard to differences among [databricks-connect](https://pypi.org/project/databricks-connect/), [databricks-api](https://pypi.org/project/databricks-api/), [databricks-sql-connector](https://pypi.org/project/databricks-sql-connector/), and [databricks-sdk](https://pypi.org/project/databricks-sdk/). It seems like databricks-connect is the official offering from Databricks? Which library should I use for what purposes?
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-10-2024 07:17 AM
Hi @endaemon ,
Those are completly different libraries, each one have specific purpose.
1. Databricks Connect
- Purpose: This is the official library provided by Databricks for connecting local Python environments to a Databricks cluster.
- Use Case: It allows you to write Spark code on your local machine and execute it on a remote Databricks cluster, making it very useful for development and testing. You can develop, test, and debug Spark code locally as if it were running in Databricks.
2. databricks-api (legacy, use Databricks Python SDK)
- Purpose: This library is a third-party wrapper for the Databricks REST API.
- Use Case: It is meant for programmatic management and automation of Databricks resources. You can use it to create and manage clusters, jobs, and other resources in Databricks, but it does not provide Spark job execution capabilities.
3. databricks sql connector
- Purpose: This is specifically for connecting to Databricks SQL endpoints.
- Use Case: If your primary need is to execute SQL queries against Databricks SQL endpoints or SQL endpoints in Unity Catalog, this library is the appropriate choice.
4. databricks-sdk
- Purpose: This is the official Databricks SDK for Python and offers a comprehensive interface for interacting with all aspects of Databricks.
- Use Case: It provides a more modern and complete interface than databricks-api, supporting operations across different Databricks services such as clusters, jobs, jobs, DBFS, and more. It's ideal for developers building complex applications or automated systems that need to interact with various Databricks features.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-10-2024 07:17 AM
Hi @endaemon ,
Those are completly different libraries, each one have specific purpose.
1. Databricks Connect
- Purpose: This is the official library provided by Databricks for connecting local Python environments to a Databricks cluster.
- Use Case: It allows you to write Spark code on your local machine and execute it on a remote Databricks cluster, making it very useful for development and testing. You can develop, test, and debug Spark code locally as if it were running in Databricks.
2. databricks-api (legacy, use Databricks Python SDK)
- Purpose: This library is a third-party wrapper for the Databricks REST API.
- Use Case: It is meant for programmatic management and automation of Databricks resources. You can use it to create and manage clusters, jobs, and other resources in Databricks, but it does not provide Spark job execution capabilities.
3. databricks sql connector
- Purpose: This is specifically for connecting to Databricks SQL endpoints.
- Use Case: If your primary need is to execute SQL queries against Databricks SQL endpoints or SQL endpoints in Unity Catalog, this library is the appropriate choice.
4. databricks-sdk
- Purpose: This is the official Databricks SDK for Python and offers a comprehensive interface for interacting with all aspects of Databricks.
- Use Case: It provides a more modern and complete interface than databricks-api, supporting operations across different Databricks services such as clusters, jobs, jobs, DBFS, and more. It's ideal for developers building complex applications or automated systems that need to interact with various Databricks features.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-19-2024 01:41 AM
@szymon_dybczak , Thanks for the explanation it is really helpful
Bhanu Gautam
Kudos are appreciated
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-10-2024 07:49 AM
Thank you for typing all that up. It is very clear and helpful.
Two follow ups if I may:
1. If one's primary goal is to execute SQL queries why prefer databricks sql connector over a generic jdbc or odbc package?
2. Did I miss any other important Databricks "official" packages?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-10-2024 07:58 AM
1. According to databricks is easier to setup than for example pyODBC
2. I think you already listed most important ones. You can take a look at below link for more:
https://docs.databricks.com/en/dev-tools/sql-drivers-tools.html

