12-05-2023 06:27 AM - edited 12-05-2023 06:28 AM
Hi All,
For an application that we are building, we need a encoding detector/utf-8 enforcer. For this, we used the python library chardet in combination with "with open". We open a file from a mounted adls location (we use a legacy hive-metastore)
When we were using Non Isolation Shared clusters, it was working fine, but because of security reasons, we have to change to Shared Access clusters. However, now the encoding detector is not working anymore.
This is how we detected encoding before:
Error using shared access cluster:
After some investigation we concluded that using with open, but also the os and glob module on mounted locations with a shared access cluster, does not work properly. Any idea how we can fix this?
For your reference, we have to use this mounted location, and a shared access cluster.
02-07-2024 01:30 AM
@Ayushi_SutharThanks! However, this does not solve the issue; because we have to use shared clusters. In the meantime we found a way of detecting the encoding on shared clusters in the following way:
02-05-2024 10:34 PM
Hi @mathijs-fish,I completely understand your hesitation and appreciate your approach to seeking guidance!
I see you are trying to access the external files from dbfs mount location.
As you can see in the snapshots which you have shared, The reason behind the below error while trying to access the external dbfs mount file using "with open" is that you are using a shared access mode cluster.
@FileNotFoundError: [Errno 2] No such file or directory: '/dbfs/mnt/art/inbound.
Ltest/EUT_Alignment_20230630_20230712130221.csv'
This is a known limitation for Shared Clusters, where /dbfs path is not accessible. You can try using a single-user cluster instead to access /dbfs which supports UC.
Please refer:
https://docs.databricks.com/clusters/configure.html#shared-access-mode-limitations
https://docs.databricks.com/en/dbfs/unity-catalog.html#how-does-dbfs-work-in-shared-access-mode
And we also have a preview feature 'Improved Shared Clusters' that addresses some of the limitations of Shared Clusters.
Leave a like if this helps, followups are appreciated.
Kudos,
Ayushi
02-07-2024 01:30 AM
@Ayushi_SutharThanks! However, this does not solve the issue; because we have to use shared clusters. In the meantime we found a way of detecting the encoding on shared clusters in the following way:
08-23-2024 12:58 PM
We are able to read the file but how we can write again into storage path for there is any solution
02-07-2024 02:20 AM
Hi @mathijs-fish thank you for sharing the solution, it will help us to update our records and documentation, enabling us to assist other customers more effectively in similar cases.
a month ago
Hi @mathijs-fish @Ayushi_Suthar - I am having the same issue with shared cluster. I can see the list of PDF files on the mount using dbutils.fs.ls(mount_point), but when I am trying to read the PDF files using PyPDF, I am getting - FileNotFoundError: [Errno 2] No such file or directory
Can I read the files after enabling certain settings on the shared cluster? I see that @srajawat can read the PDF.
Looking forward to your reply, thanks
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group