Databricks Community

mausch · ‎03-10-2025

I've been trying to run a dbt project (sourced in Azure DevOps) in Databricks Workflows, but I get this error message:

CalledProcessError: Command 'b'\nmkdir -p "/tmp/tmp-dbt-run-1124228490001263"\nunexpected_errors="$(cp -a -u "/Workspace/Repos/.internal/085c4ffe5e_commits/16113d05ffd8cd7b148ed973080aa51439e98b0c/." "/tmp/tmp-dbt-run-1124228490001263" 2> >(grep -v \'Operation not supported\'))"\nif [[ -n "$unexpected_errors" ]]; then\n  >&2 echo -e "Unexpected error(s) encountered while copying:\n$unexpected_errors"\n  exit 1\nfi\n        returned non-zero exit status 1.

Unexpected error(s) encountered while copying:
cp: cannot stat '/Workspace/Repos/.internal/085c4ffe5e_commits/16113d05ffd8cd7b148ed973080aa51439e98b0c/./venv/share/doc/networkx-3.1/examples/3d_drawing/__pycache__': No such file or directory
cp: cannot stat '/Workspace/Repos/.internal/085c4ffe5e_commits/16113d05ffd8cd7b148ed973080aa51439e98b0c/./venv/share/doc/networkx-3.1/examples/algorithms/__pycache__': No such file or directory
cp: cannot stat '/Workspace/Repos/.internal/085c4ffe5e_commits/16113d05ffd8cd7b148ed973080aa51439e98b0c/./venv/share/doc/networkx-3.1/examples/basic/__pycache__': No such file or directory
cp: cannot stat '/Workspace/Repos/.internal/085c4ffe5e_commits/16113d05ffd8cd7b148ed973080aa51439e98b0c/./venv/share/doc/networkx-3.1/examples/drawing/__pycache__': No such file or directory
cp: cannot stat '/Workspace/Repos/.internal/085c4ffe5e_commits/16113d05ffd8cd7b148ed973080aa51439e98b0c/./venv/share/doc/networkx-3.1/examples/graph/__pycache__': No such file or directory
cp: cannot stat '/Workspace/Repos/.internal/085c4ffe5e_commits/16113d05ffd8cd7b148ed973080aa51439e98b0c/./venv/share/doc/networkx-3.1/examples/subclass/__pycache__': No such file or directory

What can I do about it? Is there something I'm missing?

If you need more details, feel free to ask me.

mark_ott · ‎10-24-2025

The error you encountered when running your dbt project in Databricks Workflows comes from Databricks trying to copy the entire repository, including the virtual environment (venv) folder and its cached bytecode files (__pycache__), into a temporary workspace path (/tmp/tmp-dbt-run-...). When those directories no longer exist or can’t be accessed, the cp command fails, throwing a CalledProcessError with “cannot stat” messages.

This issue is known and documented in multiple community discussions and Stack Overflow posts. It is caused by unnecessary virtual environment or compiled Python directories being included in the workspace repository. These files are not needed to run your dbt command but cause cp to fail during Databricks’ internal job setup step.

Resolution

The fix is straightforward:

Exclude local environment and cache directories from your repository:
- At the root of your dbt project (in Azure DevOps), create or update a .gitignore file to include:
  
  text
  
  venv/ __pycache__/
- Then remove any committed versions of those directories from your repo:
  
  bash
  
  git rm -r --cached venv __pycache__ git commit -m "Remove local virtual environment and cache from repo" git push
Avoid committing the virtual environment entirely.
Databricks manages its environment using the cluster interpreter and any libraries specified in your workflow (through PyPI or libraries configuration). You don’t need your local venv included; instead, rely on Databricks to install dbt-databricks and dependencies.
Re-run the workflow.
After cleaning the repo and committing changes, Databricks should copy the repo without attempting to include those missing files, and the workflow should proceed normally.

Root Cause Recap

This happens because Databricks creates a temporary working directory (/tmp/tmp-dbt-run-*) by recursively copying the repo from /Workspace/Repos/.internal/.... When files are missing or dynamically excluded (often inside venv), the cp -a command can’t locate them. Excluding those runtime local directories prevents the copy failure.

After applying the .gitignore change and cleaning your repo, your dbt project should run correctly inside Databricks Workflows without the CalledProcessError