12-05-2023 06:27 AM - edited 12-05-2023 06:28 AM
Hi All,
For an application that we are building, we need a encoding detector/utf-8 enforcer. For this, we used the python library chardet in combination with "with open". We open a file from a mounted adls location (we use a legacy hive-metastore)
When we were using Non Isolation Shared clusters, it was working fine, but because of security reasons, we have to change to Shared Access clusters. However, now the encoding detector is not working anymore.
This is how we detected encoding before:
Error using shared access cluster:
After some investigation we concluded that using with open, but also the os and glob module on mounted locations with a shared access cluster, does not work properly. Any idea how we can fix this?
For your reference, we have to use this mounted location, and a shared access cluster.
02-07-2024 01:30 AM
@Ayushi_SutharThanks! However, this does not solve the issue; because we have to use shared clusters. In the meantime we found a way of detecting the encoding on shared clusters in the following way:
02-05-2024 10:34 PM
Hi @mathijs-fish,I completely understand your hesitation and appreciate your approach to seeking guidance!
I see you are trying to access the external files from dbfs mount location.
As you can see in the snapshots which you have shared, The reason behind the below error while trying to access the external dbfs mount file using "with open" is that you are using a shared access mode cluster.
@FileNotFoundError: [Errno 2] No such file or directory: '/dbfs/mnt/art/inbound.
Ltest/EUT_Alignment_20230630_20230712130221.csv'
This is a known limitation for Shared Clusters, where /dbfs path is not accessible. You can try using a single-user cluster instead to access /dbfs which supports UC.
Please refer:
https://docs.databricks.com/clusters/configure.html#shared-access-mode-limitations
https://docs.databricks.com/en/dbfs/unity-catalog.html#how-does-dbfs-work-in-shared-access-mode
And we also have a preview feature 'Improved Shared Clusters' that addresses some of the limitations of Shared Clusters.
Leave a like if this helps, followups are appreciated.
Kudos,
Ayushi
02-07-2024 01:30 AM
@Ayushi_SutharThanks! However, this does not solve the issue; because we have to use shared clusters. In the meantime we found a way of detecting the encoding on shared clusters in the following way:
02-07-2024 02:20 AM
Hi @mathijs-fish thank you for sharing the solution, it will help us to update our records and documentation, enabling us to assist other customers more effectively in similar cases.
Excited to expand your horizons with us? Click here to Register and begin your journey to success!
Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!