cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Warehousing & Analytics
Engage in discussions on data warehousing, analytics, and BI solutions within the Databricks Community. Share insights, tips, and best practices for leveraging data for informed decision-making.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

CalledProcessError when running dbt

mausch
New Contributor

I've been trying to run a dbt project (sourced in Azure DevOps) in Databricks Workflows, but I get this error message:

 

 

CalledProcessError: Command 'b'\nmkdir -p "/tmp/tmp-dbt-run-1124228490001263"\nunexpected_errors="$(cp -a -u "/Workspace/Repos/.internal/085c4ffe5e_commits/16113d05ffd8cd7b148ed973080aa51439e98b0c/." "/tmp/tmp-dbt-run-1124228490001263" 2> >(grep -v \'Operation not supported\'))"\nif [[ -n "$unexpected_errors" ]]; then\n  >&2 echo -e "Unexpected error(s) encountered while copying:\n$unexpected_errors"\n  exit 1\nfi\n        returned non-zero exit status 1.

Unexpected error(s) encountered while copying:
cp: cannot stat '/Workspace/Repos/.internal/085c4ffe5e_commits/16113d05ffd8cd7b148ed973080aa51439e98b0c/./venv/share/doc/networkx-3.1/examples/3d_drawing/__pycache__': No such file or directory
cp: cannot stat '/Workspace/Repos/.internal/085c4ffe5e_commits/16113d05ffd8cd7b148ed973080aa51439e98b0c/./venv/share/doc/networkx-3.1/examples/algorithms/__pycache__': No such file or directory
cp: cannot stat '/Workspace/Repos/.internal/085c4ffe5e_commits/16113d05ffd8cd7b148ed973080aa51439e98b0c/./venv/share/doc/networkx-3.1/examples/basic/__pycache__': No such file or directory
cp: cannot stat '/Workspace/Repos/.internal/085c4ffe5e_commits/16113d05ffd8cd7b148ed973080aa51439e98b0c/./venv/share/doc/networkx-3.1/examples/drawing/__pycache__': No such file or directory
cp: cannot stat '/Workspace/Repos/.internal/085c4ffe5e_commits/16113d05ffd8cd7b148ed973080aa51439e98b0c/./venv/share/doc/networkx-3.1/examples/graph/__pycache__': No such file or directory
cp: cannot stat '/Workspace/Repos/.internal/085c4ffe5e_commits/16113d05ffd8cd7b148ed973080aa51439e98b0c/./venv/share/doc/networkx-3.1/examples/subclass/__pycache__': No such file or directory

 

What can I do about it? Is there something I'm missing?

If you need more details, feel free to ask me.

1 REPLY 1

mark_ott
Databricks Employee
Databricks Employee

The error you encountered when running your dbt project in Databricks Workflows comes from Databricks trying to copy the entire repository, including the virtual environment (venv) folder and its cached bytecode files (__pycache__), into a temporary workspace path (/tmp/tmp-dbt-run-...). When those directories no longer exist or canโ€™t be accessed, the cp command fails, throwing a CalledProcessError with โ€œcannot statโ€ messages.

This issue is known and documented in multiple community discussions and Stack Overflow posts. It is caused by unnecessary virtual environment or compiled Python directories being included in the workspace repository. These files are not needed to run your dbt command but cause cp to fail during Databricksโ€™ internal job setup step.โ€‹

Resolution

The fix is straightforward:

  1. Exclude local environment and cache directories from your repository:

    • At the root of your dbt project (in Azure DevOps), create or update a .gitignore file to include:

      text
      venv/ __pycache__/
    • Then remove any committed versions of those directories from your repo:

      bash
      git rm -r --cached venv __pycache__ git commit -m "Remove local virtual environment and cache from repo" git push
  2. Avoid committing the virtual environment entirely.
    Databricks manages its environment using the cluster interpreter and any libraries specified in your workflow (through PyPI or libraries configuration). You donโ€™t need your local venv included; instead, rely on Databricks to install dbt-databricks and dependencies.

  3. Re-run the workflow.
    After cleaning the repo and committing changes, Databricks should copy the repo without attempting to include those missing files, and the workflow should proceed normally.

Root Cause Recap

This happens because Databricks creates a temporary working directory (/tmp/tmp-dbt-run-*) by recursively copying the repo from /Workspace/Repos/.internal/.... When files are missing or dynamically excluded (often inside venv), the cp -a command canโ€™t locate them. Excluding those runtime local directories prevents the copy failure.

After applying the .gitignore change and cleaning your repo, your dbt project should run correctly inside Databricks Workflows without the CalledProcessError