2 weeks ago
I have a workflow job (source is git) to access a notebook and execute it. From the job, it failed with error:
Py4JJavaError: An error occurred while calling o466.run. : com.databricks.WorkflowException: com.databricks.NotebookExecutionException: FAILED: Unable to access the notebook "Workspace/Repos/.internal/3525eba707_commits/6b442ae67043bf77c0b812aeb75373f0d942808f/{my relative git path}". Either it does not exist, or the identity used to run this job, **, lacks the required permissions.
However, if i copy that exact internal path into a standalone manually run notebook, i'm able to find that file and execute it.
Both the job and the manually run notebook are using the same Databricks runtime. What is the root cause that cause the internal file not found in a job but can be found in a standalone notebook?
2 weeks ago
Hi @lauraxyz ,
When you ran the code using the notebook, it would run it by default. Could you check if the principal in the run_as parameter in the job has permission on the files?
Ref to change run_as parameter - "https://kb.databricks.com/jobs/trigger-a-job-as-a-specific-user-with-run-as"
2 weeks ago
hi @parthSundarka , both the job and the standalone manually run notebook are run as the same user.
2 weeks ago
It looks like the issue is how we set the source of task when calling dbutils.notebook API.
If job task points to GIT as task source, then they cannot find the callee notebook, no matter the notebook is in .internal path, or a hardcoded .bundle path.
However, if in the job config, set the task source to be WORKSPACE, then it works. Is that behavior expected?
2 weeks ago
Yes, this behavior is expected due to the way Databricks handles notebook paths and task sources. When you set the task source to GIT, Databricks expects the notebooks to be managed and referenced through the Git repository. Conversely, when the task source is set to WORKSPACE, Databricks expects the notebooks to be located within the Databricks workspace.
2 weeks ago
Just some clarification: the caller notebook can be found with no issues, no matter the task's source is GIT or WORKSPACE. However, the callee notebook, which is called by the caller notebook with dbutils.notebook.run(), cannot be found if the caller notebook task's source is GIT.
The way I specified the callee notebook path, is using relative path to the caller notebook. Therefore, it should be found by the caller notebook (in .internal path if source is git, .bundle path if source is workspace), no matter caller's source is git or workspace.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group