cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Permission issue for pandas to read local files

liu
New Contributor III

I can use pandas to read local files in a notebook, such as those located in tmp.However, when I run two consecutive notebooks within the same job and read files with pandas in both, I encounter a permission error in the second notebook stating that I do not have permission.Could it be that my operations reading from or writing to tables altered the environment?

3 REPLIES 3

Pilsner
Contributor

Hello @liu 

The fact that you're getting a permissions issue when trying to access the same location in different notebooks is indeed confusing. Here's my take on what could be the cause of this issue:

Jobs in databricks default to creating their own short-lived clusters. These are effectively temporary clusters, created specifically to enable a single job or task to run. If each task/ notebook within your job has its own ephemeral cluster, they also have some separate resources, i.e separate temp environments. I would then assume that when your second notebook tries to access the temp directory belonging to your first notebook, it runs into a permissions issue. Equally, it could be the case that the second notebook simply can't find the file path, and the issue simply manifests as a "permissions" issue.

The way to get around this would be to use a persistent, all-purpose compute resource, or write to non-local (global) file paths. For more information on compute resources, the following link may be useful:

https://docs.databricks.com/aws/en/jobs/compute

Please let me know if you find this helpful - feel free to ask any further questions. 

Regards - Pilsner

liu
New Contributor III

Hello, @Pilsner 
thank you for your reply

The situation is slightly different,
I transferred the file from the SFTP system to the local path of Databricks, read the file into Pandas, and then passed it to Spark.
In this job, although the two notebooks access the same local path, they are not operating on the same file.
If two notebooks are executed separately, there will be no issue, but if they are put into one job, it will not work.
Sometimes, when executing the same notebook manually, it will also report a permission error on the second attempt.

BS_THE_ANALYST
Esteemed Contributor

@liu is there anything preventing you from moving this file into a Volume on the Unity Catalog & then reading it from there?

Based on your original message, the "permission" message is making me feel like it's having an issue with opening the file; perhaps it's "in use" somehow. 

All the best,
BS

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now