- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-03-2022 04:05 AM
Hi all,
On my DBR installations, s3a scheme is mapped to shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem. On my customer's DBR installations it is mapped to com.databricks.s3a.S3AFileSystem.
We both use the same DBR runtime, and none of us has configured anything to override this setting.
What is the cause for this difference? And how can I make sure I'm using the right filesystem? How can I make sure in the future no third file system appears and breaks my code again?
- Labels:
-
Databricks Runtime
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-04-2022 02:32 AM
@Yoni Au , Databricks Runtime 7.3 LTS and above use the new connector com.databricks.s3a.S3AFileSystem. Are you using 7.3?
Anyway, please verify spark config on both installations (via Cluster -> Spark UI -> Environment) what is there regarding S3AFileSystem? and then set common values for both (via Cluster -> Configuration -> Advanced options)
My blog: https://databrickster.medium.com/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-05-2022 12:06 AM
@Yoni Au , If both of you are using the same DBR version, then you should not find any difference. As @Hubert Dudek mentioned, there might be some spark configuration change made on one of the clusters. Also, it's worth checking for any cluster scope or global init script.