cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
cancel
Showing results for 
Search instead for 
Did you mean: 

"with open" not working with Shared Access Cluster on mounted location

mathijs-fish
New Contributor III

Hi All,

For an application that we are building, we need a encoding detector/utf-8 enforcer. For this, we used the python library chardet in combination with "with open". We open a file from a mounted adls location (we use a legacy hive-metastore)

When we were using Non Isolation Shared clusters, it was working fine, but because of security reasons, we have to change to Shared Access clusters. However, now the encoding detector is not working anymore.

This is how we detected encoding before:

mathijsfish_1-1701785425743.png

Error using shared access cluster:

mathijsfish_2-1701785466668.png

After some investigation we concluded that using with open, but also the os and glob module on mounted locations with a shared access cluster, does not work properly. Any idea how we can fix this?

For your reference, we have to use this mounted location, and a shared access cluster.

 

1 ACCEPTED SOLUTION

Accepted Solutions

mathijs-fish
New Contributor III

@Ayushi_SutharThanks! However, this does not solve the issue; because we have to use shared clusters. In the meantime we found a way of detecting the encoding on shared clusters in the following way:

rawdata = (
        spark.read.format("binaryFile")
        .load(file_path)
        .selectExpr("SUBSTR(content, 0, 500000) AS content")
        .collect()[0]
        .content
    )
    encoding = chardet.detect(rawdata)["encoding"]
    print(encoding)

View solution in original post

3 REPLIES 3

Ayushi_Suthar
Honored Contributor
Honored Contributor

Hi @mathijs-fish,I completely understand your hesitation and appreciate your approach to seeking guidance!

I see you are trying to access the external files from dbfs mount location.
As you can see in the snapshots which you have shared, The reason behind the below error while trying to access the external dbfs mount file using "with open" is that you are using a shared access mode cluster.

@FileNotFoundError: [Errno 2] No such file or directory: '/dbfs/mnt/art/inbound.
Ltest/EUT_Alignment_20230630_20230712130221.csv'

This is a known limitation for Shared Clusters, where /dbfs path is not accessible. You can try using a single-user cluster instead to access /dbfs which supports UC.

Please refer:
https://docs.databricks.com/clusters/configure.html#shared-access-mode-limitations
https://docs.databricks.com/en/dbfs/unity-catalog.html#how-does-dbfs-work-in-shared-access-mode

And we also have a preview feature 'Improved Shared Clusters' that addresses some of the limitations of Shared Clusters.

Leave a like if this helps, followups are appreciated.

Kudos,

Ayushi

mathijs-fish
New Contributor III

@Ayushi_SutharThanks! However, this does not solve the issue; because we have to use shared clusters. In the meantime we found a way of detecting the encoding on shared clusters in the following way:

rawdata = (
        spark.read.format("binaryFile")
        .load(file_path)
        .selectExpr("SUBSTR(content, 0, 500000) AS content")
        .collect()[0]
        .content
    )
    encoding = chardet.detect(rawdata)["encoding"]
    print(encoding)

Ayushi_Suthar
Honored Contributor
Honored Contributor

Hi @mathijs-fish thank you for sharing the solution, it will help us to update our records and documentation, enabling us to assist other customers more effectively in similar cases.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.