cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

databricks bundle install: Error: Maximum file size of 524288000 exceeded

weakliemg
New Contributor II

I have a job that's running some ML classification models. This uses PyTorch 2.5.0. I've configured the project with that dependency. 

I can deploy my job to our dev system from my laptop and all goes well. When I run this off our CI/CD server, for some reason it wants to upload dependencies when it runs databricks bundle deploy, it appears to try to upload the dependencies and fails on the PyTorch wheel which is apparently huge. 

I shouldn't need to upload dependencies with the DAB, why is it trying to deploy them? Running off my laptop, it doesn't try to upload the dependencies and the deploy is done in a few seconds.

2 REPLIES 2

SP_6721
Contributor III

Hi @weakliemg ,

When deploying with Databricks Asset Bundles (DAB) from your CI/CD server, it tries to upload any local dependencies referenced in the bundle config, even if theyโ€™re already installed, because it treats them as local files. Thatโ€™s likely why itโ€™s trying to upload the large PyTorch wheel and failing.
To avoid this:

  • Upload the dependency to a workspace or Unity Catalog location and reference that path
  • Use a PyPI reference in your bundle config instead of a local file

weakliemg
New Contributor II

Thanks but why does this behavior not happen locally? Also, the bundle config doesn't reference torch, it's used in code and included as a dev dependency in pyproject.toml. My libraries are just this:

          libraries:
            - whl: ../dist/*.whl
            - pypi:
                package: pydantic>=2.0
            - maven:
                coordinates: io.dataflint:spark_2.12:0.4.0