cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Notebook in path workspace/repos/.internal/**_commits/** was unable to be accessed

lauraxyz
Contributor

I have a workflow job (source is git) to access a notebook and execute it.  From the job, it failed with error:

Py4JJavaError: An error occurred while calling o466.run. : com.databricks.WorkflowException: com.databricks.NotebookExecutionException: FAILED: Unable to access the notebook "Workspace/Repos/.internal/3525eba707_commits/6b442ae67043bf77c0b812aeb75373f0d942808f/{my relative git path}". Either it does not exist, or the identity used to run this job, **, lacks the required permissions.

However, if i copy that exact internal path into a standalone manually run notebook, i'm able to find that file and execute it.

Both the job and the manually run notebook are using the same Databricks runtime.  What is the root cause that cause the internal file not found in a job but can be found in a standalone notebook?

5 REPLIES 5

parthSundarka
Databricks Employee
Databricks Employee

Hi @lauraxyz ,

When you ran the code using the notebook, it would run it by default. Could you check if the principal in the run_as parameter in the job has permission on the files?

Ref to change run_as parameter - "https://kb.databricks.com/jobs/trigger-a-job-as-a-specific-user-with-run-as"

hi @parthSundarka , both the job and the standalone manually run notebook are run as the same user. 

lauraxyz
Contributor

It looks like the issue is how we set the source of task when calling dbutils.notebook API.

If job task points to GIT as task source, then they cannot find the callee notebook, no matter the notebook is in .internal path, or a hardcoded .bundle path.

However, if in the job config, set the task source to be WORKSPACE, then it works. Is that behavior expected?

saurabh18cs
Valued Contributor III

Yes, this behavior is expected due to the way Databricks handles notebook paths and task sources. When you set the task source to GIT, Databricks expects the notebooks to be managed and referenced through the Git repository. Conversely, when the task source is set to WORKSPACE, Databricks expects the notebooks to be located within the Databricks workspace.

lauraxyz
Contributor

Just some clarification:  the caller notebook can be found with no issues, no matter the task's source is GIT or WORKSPACE.  However, the callee notebook, which is called by the caller notebook with dbutils.notebook.run(), cannot be found if the caller notebook task's source is GIT.

The way I specified the callee notebook path, is using relative path to the caller notebook.  Therefore, it should be found by the caller notebook (in .internal path if source is git, .bundle path if source is workspace), no matter caller's source is git or workspace. 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group