<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Do I need many wheels for each job in project? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/do-i-need-many-wheels-for-each-job-in-project/m-p/114222#M44752</link>
    <description>&lt;P&gt;Hi&amp;nbsp;kmodelew,&lt;/P&gt;&lt;P&gt;How are you doing today?, As per my understanding,&amp;nbsp;&lt;SPAN&gt;this is a common confusion when using &lt;/SPAN&gt;&lt;SPAN&gt;Databricks Asset Bundles (DAB)&lt;/SPAN&gt;&lt;SPAN&gt; with &lt;/SPAN&gt;&lt;SPAN&gt;multiple task groups&lt;/SPAN&gt;&lt;SPAN&gt; and a shared codebase. The key thing to know is that &lt;/SPAN&gt;&lt;SPAN&gt;DAB generates one &lt;/SPAN&gt;&lt;SPAN&gt;.whl&lt;/SPAN&gt;&lt;SPAN&gt; file for the entire bundle&lt;/SPAN&gt;&lt;SPAN&gt;, which includes all your packages under &lt;/SPAN&gt;&lt;SPAN&gt;src/&lt;/SPAN&gt;&lt;SPAN&gt;, not separate wheel files for each task group. So when your YAML is looking for a wheel specific to &lt;/SPAN&gt;&lt;SPAN&gt;task_group2&lt;/SPAN&gt;&lt;SPAN&gt;, Databricks can’t find it because the wheel is named after your top-level project, not the individual packages.&lt;/SPAN&gt;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;P&gt;To fix this, you just need to reference the correct package name (matching your top-level wheel) in the package_name: field of each job in your .yml files, and make sure your setup.py includes all sub-packages via find_packages() like you're already doing. So instead of trying to set package_name: task_group1 or task_group2, use the actual package name defined in setup.py (e.g. my_project), and in each job, point to the correct entry point function under that namespace (e.g. my_project.task_group1.main). That should fix the “wheel not found” error and let all task groups run off the same wheel file. Let me know if you want help adjusting the setup.py or yml—happy to take a look!&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Brahma&lt;/P&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
    <pubDate>Wed, 02 Apr 2025 02:57:52 GMT</pubDate>
    <dc:creator>Brahmareddy</dc:creator>
    <dc:date>2025-04-02T02:57:52Z</dc:date>
    <item>
      <title>Do I need many wheels for each job in project?</title>
      <link>https://community.databricks.com/t5/data-engineering/do-i-need-many-wheels-for-each-job-in-project/m-p/114207#M44749</link>
      <description>&lt;P&gt;I have&amp;nbsp; a project witch my commons, like sparksession object (to run code in pycharm using databricks connect library and the same code directly on databricks).I have under src a few packages from which DAB creates separate jobs. I'm using PyCharm. Structure of my project is as follows:&lt;/P&gt;&lt;P&gt;src/task_group1/&amp;lt;many_python_tasks&amp;gt;&lt;BR /&gt;src/task_group2&amp;lt;many_python_tasks&amp;gt;&lt;/P&gt;&lt;P&gt;resources/task_group1.yml #tasks and job structure&lt;BR /&gt;resources/task_group2.yml #tasks and job structure&lt;/P&gt;&lt;DIV&gt;&lt;PRE&gt;&lt;SPAN&gt;tasks&lt;/SPAN&gt;:&lt;BR /&gt;  - &lt;SPAN&gt;task_key&lt;/SPAN&gt;: main_task&lt;BR /&gt;    &lt;SPAN&gt;job_cluster_key&lt;/SPAN&gt;: job_cluster&lt;BR /&gt;    &lt;SPAN&gt;python_wheel_task&lt;/SPAN&gt;:&lt;BR /&gt;      &lt;SPAN&gt;package_name&lt;/SPAN&gt;: task_group1&lt;BR /&gt;      &lt;SPAN&gt;entry_point&lt;/SPAN&gt;: main&lt;BR /&gt;    &lt;SPAN&gt;libraries&lt;/SPAN&gt;:&lt;BR /&gt;      &lt;SPAN&gt;# By default we just include the .whl file generated for the bundle_test package.&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;      # See https://docs.databricks.com/dev-tools/bundles/library-dependencies.html&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;      # for more information on how to add other libraries.&lt;BR /&gt;&lt;/SPAN&gt;      - &lt;SPAN&gt;whl&lt;/SPAN&gt;: ../dist/*.whl&lt;/PRE&gt;&lt;/DIV&gt;&lt;P&gt;After running on Databricks I have this error:&amp;nbsp;&lt;SPAN&gt;run failed with error message Python wheel with name task_group2 could not be found. Please check the driver logs for more details&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;Is Databricks Asset Bundles should generate many wheel files? Each *.whl file for each job? One wheel generated by DAB have all packages included. Is it a matter of wrong references in yml files and setup.py?&lt;/P&gt;&lt;P&gt;setup.py with corect entry points.&lt;/P&gt;&lt;DIV&gt;&lt;PRE&gt;&lt;SPAN&gt;packages&lt;/SPAN&gt;=find_packages&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;where&lt;/SPAN&gt;=&lt;SPAN&gt;"./src"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;,&lt;BR /&gt;&lt;SPAN&gt;package_dir&lt;/SPAN&gt;=&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;SPAN&gt;""&lt;/SPAN&gt;: &lt;SPAN&gt;"src"&lt;/SPAN&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;,&lt;/PRE&gt;&lt;/DIV&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 01 Apr 2025 20:41:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/do-i-need-many-wheels-for-each-job-in-project/m-p/114207#M44749</guid>
      <dc:creator>kmodelew</dc:creator>
      <dc:date>2025-04-01T20:41:25Z</dc:date>
    </item>
    <item>
      <title>Re: Do I need many wheels for each job in project?</title>
      <link>https://community.databricks.com/t5/data-engineering/do-i-need-many-wheels-for-each-job-in-project/m-p/114222#M44752</link>
      <description>&lt;P&gt;Hi&amp;nbsp;kmodelew,&lt;/P&gt;&lt;P&gt;How are you doing today?, As per my understanding,&amp;nbsp;&lt;SPAN&gt;this is a common confusion when using &lt;/SPAN&gt;&lt;SPAN&gt;Databricks Asset Bundles (DAB)&lt;/SPAN&gt;&lt;SPAN&gt; with &lt;/SPAN&gt;&lt;SPAN&gt;multiple task groups&lt;/SPAN&gt;&lt;SPAN&gt; and a shared codebase. The key thing to know is that &lt;/SPAN&gt;&lt;SPAN&gt;DAB generates one &lt;/SPAN&gt;&lt;SPAN&gt;.whl&lt;/SPAN&gt;&lt;SPAN&gt; file for the entire bundle&lt;/SPAN&gt;&lt;SPAN&gt;, which includes all your packages under &lt;/SPAN&gt;&lt;SPAN&gt;src/&lt;/SPAN&gt;&lt;SPAN&gt;, not separate wheel files for each task group. So when your YAML is looking for a wheel specific to &lt;/SPAN&gt;&lt;SPAN&gt;task_group2&lt;/SPAN&gt;&lt;SPAN&gt;, Databricks can’t find it because the wheel is named after your top-level project, not the individual packages.&lt;/SPAN&gt;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;P&gt;To fix this, you just need to reference the correct package name (matching your top-level wheel) in the package_name: field of each job in your .yml files, and make sure your setup.py includes all sub-packages via find_packages() like you're already doing. So instead of trying to set package_name: task_group1 or task_group2, use the actual package name defined in setup.py (e.g. my_project), and in each job, point to the correct entry point function under that namespace (e.g. my_project.task_group1.main). That should fix the “wheel not found” error and let all task groups run off the same wheel file. Let me know if you want help adjusting the setup.py or yml—happy to take a look!&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Brahma&lt;/P&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Wed, 02 Apr 2025 02:57:52 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/do-i-need-many-wheels-for-each-job-in-project/m-p/114222#M44752</guid>
      <dc:creator>Brahmareddy</dc:creator>
      <dc:date>2025-04-02T02:57:52Z</dc:date>
    </item>
    <item>
      <title>Re: Do I need many wheels for each job in project?</title>
      <link>https://community.databricks.com/t5/data-engineering/do-i-need-many-wheels-for-each-job-in-project/m-p/115145#M45030</link>
      <description>&lt;P&gt;Hi, I hope it would be usefuel. Here are my files:&amp;nbsp;&lt;/P&gt;&lt;P&gt;project structure -&amp;gt; DAB_project_structure.png&lt;/P&gt;&lt;P&gt;each yml file for job definitions -&amp;gt; task_group_1_job.png and task_group_2_job.png&lt;/P&gt;&lt;P&gt;Each .py file has main() method.&lt;/P&gt;&lt;P&gt;setup.py:&lt;/P&gt;&lt;DIV&gt;&lt;PRE&gt;&lt;BR /&gt;&lt;SPAN&gt;description&lt;/SPAN&gt;=&lt;SPAN&gt;"wheel file based on bundle_test/src"&lt;/SPAN&gt;,&lt;BR /&gt;&lt;SPAN&gt;packages&lt;/SPAN&gt;=find_packages&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;where&lt;/SPAN&gt;=&lt;SPAN&gt;"./src"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;,&lt;BR /&gt;&lt;SPAN&gt;package_dir&lt;/SPAN&gt;=&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;SPAN&gt;""&lt;/SPAN&gt;: &lt;SPAN&gt;"src"&lt;/SPAN&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;,&lt;BR /&gt;&lt;SPAN&gt;entry_points&lt;/SPAN&gt;=&lt;SPAN&gt;{&lt;BR /&gt;&lt;/SPAN&gt;    &lt;SPAN&gt;"packages"&lt;/SPAN&gt;: &lt;SPAN&gt;[&lt;BR /&gt;&lt;/SPAN&gt;        &lt;SPAN&gt;"task_group_1_task_1=bundle_test.task_group_1.task_group_1_task_1:main"&lt;/SPAN&gt;,&lt;BR /&gt;        &lt;SPAN&gt;"task_group_2_task_2=bundle_test.task_group_2.task_group_1_task_2:main"&lt;/SPAN&gt;,&lt;BR /&gt;&lt;BR /&gt;    &lt;SPAN&gt;]&lt;/SPAN&gt;,&lt;BR /&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;,&lt;BR /&gt;&lt;SPAN&gt;install_requires&lt;/SPAN&gt;=&lt;SPAN&gt;[&lt;BR /&gt;&lt;/SPAN&gt;    &lt;SPAN&gt;# Dependencies in case the output wheel file is used as a library dependency.&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;    # For defining dependencies, when this package is used in Databricks, see:&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;    # https://docs.databricks.com/dev-tools/bundles/library-dependencies.html&lt;BR /&gt;&lt;/SPAN&gt;    &lt;SPAN&gt;"setuptools"&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;]&lt;/SPAN&gt;,&lt;/PRE&gt;&lt;/DIV&gt;</description>
      <pubDate>Thu, 10 Apr 2025 09:35:46 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/do-i-need-many-wheels-for-each-job-in-project/m-p/115145#M45030</guid>
      <dc:creator>kmodelew</dc:creator>
      <dc:date>2025-04-10T09:35:46Z</dc:date>
    </item>
  </channel>
</rss>

