cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results for 
Search instead for 
Did you mean: 

Differences among python libraries

endaemon
New Contributor II

I am confused as to the differences between various python libraries for databricks: especially with regard to differences among [databricks-connect](https://pypi.org/project/databricks-connect/), [databricks-api](https://pypi.org/project/databricks-api/), [databricks-sql-connector](https://pypi.org/project/databricks-sql-connector/), and [databricks-sdk](https://pypi.org/project/databricks-sdk/). It seems like databricks-connect is the official offering from Databricks? Which library should I use for what purposes?

1 ACCEPTED SOLUTION

Accepted Solutions

szymon_dybczak
Esteemed Contributor III

Hi @endaemon ,

Those are completly different libraries, each one have specific purpose.

1. Databricks Connect

  • Purpose: This is the official library provided by Databricks for connecting local Python environments to a Databricks cluster.
  • Use Case: It allows you to write Spark code on your local machine and execute it on a remote Databricks cluster, making it very useful for development and testing. You can develop, test, and debug Spark code locally as if it were running in Databricks.

2. databricks-api (legacy, use Databricks Python SDK)

  • Purpose: This library is a third-party wrapper for the Databricks REST API.
  • Use Case: It is meant for programmatic management and automation of Databricks resources. You can use it to create and manage clusters, jobs, and other resources in Databricks, but it does not provide Spark job execution capabilities.

3. databricks sql connector

  • Purpose: This is specifically for connecting to Databricks SQL endpoints.
  • Use Case: If your primary need is to execute SQL queries against Databricks SQL endpoints or SQL endpoints in Unity Catalog, this library is the appropriate choice.

4. databricks-sdk

  • Purpose: This is the official Databricks SDK for Python and offers a comprehensive interface for interacting with all aspects of Databricks. 
  • Use Case: It provides a more modern and complete interface than databricks-api, supporting operations across different Databricks services such as clusters, jobs, jobs, DBFS, and more. It's ideal for developers building complex applications or automated systems that need to interact with various Databricks features.

View solution in original post

4 REPLIES 4

szymon_dybczak
Esteemed Contributor III

Hi @endaemon ,

Those are completly different libraries, each one have specific purpose.

1. Databricks Connect

  • Purpose: This is the official library provided by Databricks for connecting local Python environments to a Databricks cluster.
  • Use Case: It allows you to write Spark code on your local machine and execute it on a remote Databricks cluster, making it very useful for development and testing. You can develop, test, and debug Spark code locally as if it were running in Databricks.

2. databricks-api (legacy, use Databricks Python SDK)

  • Purpose: This library is a third-party wrapper for the Databricks REST API.
  • Use Case: It is meant for programmatic management and automation of Databricks resources. You can use it to create and manage clusters, jobs, and other resources in Databricks, but it does not provide Spark job execution capabilities.

3. databricks sql connector

  • Purpose: This is specifically for connecting to Databricks SQL endpoints.
  • Use Case: If your primary need is to execute SQL queries against Databricks SQL endpoints or SQL endpoints in Unity Catalog, this library is the appropriate choice.

4. databricks-sdk

  • Purpose: This is the official Databricks SDK for Python and offers a comprehensive interface for interacting with all aspects of Databricks. 
  • Use Case: It provides a more modern and complete interface than databricks-api, supporting operations across different Databricks services such as clusters, jobs, jobs, DBFS, and more. It's ideal for developers building complex applications or automated systems that need to interact with various Databricks features.

@szymon_dybczak , Thanks for the explanation it is really helpful

Regards
Bhanu Gautam

Kudos are appreciated

endaemon
New Contributor II

@szymon_dybczak,

Thank you for typing all that up. It is very clear and helpful.

Two follow ups if I may:

1. If one's primary goal is to execute SQL queries why prefer databricks sql connector over a generic jdbc or odbc package?

2. Did I miss any other important Databricks "official" packages?

szymon_dybczak
Esteemed Contributor III

1. According to databricks is easier to setup than for example pyODBC

2. I think you already listed most important ones. You can take a look at below link for more: 

https://docs.databricks.com/en/dev-tools/sql-drivers-tools.html

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group