cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How are .whl files executed for Python wheel tasks?

the_dude
New Contributor II

Hello,

We package a Poetry managed project into a .whl and run it as a Python wheel task. Naturally, many of the dependencies referenced by the .whl file are already present on the Databricks cluster. Is this detected by the task setup (in its virtual env I suppose) and those dependencies are not pulled from a repo or will the dependencies be re-installed regardless? If so, is there a way to avoid this extra overhead?

Thank you,


David

3 REPLIES 3

Nik_Vanderhoof
Contributor

Hi David,

I can't speak exactly to how Poetry handles the dependency resolution of libraries that are already installed, or how that interacts with the Databricks runtime. However, I can offer you some advice on how my team handles this situtation.

It's been very common for to require libraries available on the Databricks runtime in our local tests, or in a wheel that we publish elsewhere besides Databricks.

We've handled this by specifying any libraries provided by the Databricks runtime as optional/dev dependency groups in our pyproject.toml. This lets us test them locally or in CI, and then not attempt to install them when we deploy our wheels to Databricks workflows.

 

Hello @Nik_Vanderhoof thank you for this suggestion.

You're welcome!

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now