The error you encountered when running your dbt project in Databricks Workflows comes from Databricks trying to copy the entire repository, including the virtual environment (venv) folder and its cached bytecode files (__pycache__), into a temporary workspace path (/tmp/tmp-dbt-run-...). When those directories no longer exist or canโt be accessed, the cp command fails, throwing a CalledProcessError with โcannot statโ messages.
This issue is known and documented in multiple community discussions and Stack Overflow posts. It is caused by unnecessary virtual environment or compiled Python directories being included in the workspace repository. These files are not needed to run your dbt command but cause cp to fail during Databricksโ internal job setup step.โ
Resolution
The fix is straightforward:
-
Exclude local environment and cache directories from your repository:
-
At the root of your dbt project (in Azure DevOps), create or update a .gitignore file to include:
-
Then remove any committed versions of those directories from your repo:
git rm -r --cached venv __pycache__
git commit -m "Remove local virtual environment and cache from repo"
git push
-
Avoid committing the virtual environment entirely.
Databricks manages its environment using the cluster interpreter and any libraries specified in your workflow (through PyPI or libraries configuration). You donโt need your local venv included; instead, rely on Databricks to install dbt-databricks and dependencies.
-
Re-run the workflow.
After cleaning the repo and committing changes, Databricks should copy the repo without attempting to include those missing files, and the workflow should proceed normally.
Root Cause Recap
This happens because Databricks creates a temporary working directory (/tmp/tmp-dbt-run-*) by recursively copying the repo from /Workspace/Repos/.internal/.... When files are missing or dynamically excluded (often inside venv), the cp -a command canโt locate them. Excluding those runtime local directories prevents the copy failure.
After applying the .gitignore change and cleaning your repo, your dbt project should run correctly inside Databricks Workflows without the CalledProcessError