cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Do I need many wheels for each job in project?

kmodelew
New Contributor III

I have  a project witch my commons, like sparksession object (to run code in pycharm using databricks connect library and the same code directly on databricks).I have under src a few packages from which DAB creates separate jobs. I'm using PyCharm. Structure of my project is as follows:

src/task_group1/<many_python_tasks>
src/task_group2<many_python_tasks>

resources/task_group1.yml #tasks and job structure
resources/task_group2.yml #tasks and job structure

tasks:
- task_key: main_task
job_cluster_key: job_cluster
python_wheel_task:
package_name: task_group1
entry_point: main
libraries:
# By default we just include the .whl file generated for the bundle_test package.
# See https://docs.databricks.com/dev-tools/bundles/library-dependencies.html
# for more information on how to add other libraries.
- whl: ../dist/*.whl

After running on Databricks I have this error: run failed with error message Python wheel with name task_group2 could not be found. Please check the driver logs for more details

Is Databricks Asset Bundles should generate many wheel files? Each *.whl file for each job? One wheel generated by DAB have all packages included. Is it a matter of wrong references in yml files and setup.py?

setup.py with corect entry points.

packages=find_packages(where="./src"),
package_dir={"": "src"},

 

 

 

1 REPLY 1

Brahmareddy
Honored Contributor III

Hi kmodelew,

How are you doing today?, As per my understanding, this is a common confusion when using Databricks Asset Bundles (DAB) with multiple task groups and a shared codebase. The key thing to know is that DAB generates one .whl file for the entire bundle, which includes all your packages under src/, not separate wheel files for each task group. So when your YAML is looking for a wheel specific to task_group2, Databricks can’t find it because the wheel is named after your top-level project, not the individual packages.

To fix this, you just need to reference the correct package name (matching your top-level wheel) in the package_name: field of each job in your .yml files, and make sure your setup.py includes all sub-packages via find_packages() like you're already doing. So instead of trying to set package_name: task_group1 or task_group2, use the actual package name defined in setup.py (e.g. my_project), and in each job, point to the correct entry point function under that namespace (e.g. my_project.task_group1.main). That should fix the “wheel not found” error and let all task groups run off the same wheel file. Let me know if you want help adjusting the setup.py or yml—happy to take a look!

Regards,

Brahma

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now