cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Caused by: org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "s3"

parimalpatil28
New Contributor III

Hello,

I am facing issue while "Insert query or while .saveAsTable". The error is thrown by query is 
Caused by: org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "s3"

org.apache.spark.SparkException: [TASK_WRITE_FAILED] Task failed while writing rows to s3://dbricksunitycatalog/

 

What configuration i am missing can you please help me?

I tried to add instance profile but it didn't work

 

Thanks,

Parimal

1 ACCEPTED SOLUTION

Accepted Solutions

parimalpatil28
New Contributor III

Hello @Kaniz ,

Thanks for the help.
We have also investigated internally, we have found the root cause of it.

Our products configuration overwriting the Databricks default spark.executor.extraclasspath confs. because of this our clusters was not able to find S3 related jars.
When we compared normal cluster and our product installed cluster, we have noticed this difference.
when we added our conf and databricks default conf. it started working.

11.3 LTS to 13.3 LTS these version were affected.  it was working fine till 10.4 LTS.

Thanks,

Parimal

View solution in original post

3 REPLIES 3

Kaniz
Community Manager
Community Manager

Hi @parimalpatil28

The error message "No FileSystem for scheme 's3' " indicates that Spark is not able to find a compatible file system for the "s3" scheme. This could be because you do not have the required packages or configurations installed on your cluster.

Here are some steps you can follow to resolve this issue:

  1. Check if the Hadoop AWS JAR files are present on your cluster. You can check by running the following command in a notebook cell:
%sh
ls /databricks/spark/jars | grep hadoop-aws

This command lists the JAR files containing the AWS SDK for Hadoop in the Spark jars directory.

If you do not see any results, you will need to download and install the Hadoop AWS JAR files. You can download them from the Apache Hadoop website. After downloading the JAR files, you can upload them to DBFS and add them to the cluster's classpath using the "spark.driver.extraClassPath" and "spark.executor.extraClassPath" properties. You can set these properties by going to the "Advanced Options" tab of your cluster configuration and adding them in the "Spark" configuration options field.

2. Check if the "fs.s3.impl" property is set to the correct value in the Spark configuration. This property specifies the class name of the S3 filesystem implementation to use. You can check the current value of this property by running the following code snippet in a notebook cell:

python spark.sparkContext.getConf().get("fs.s3.impl")

The output should be "org.apache.hadoop.fs.s3a.S3AFileSystem" for S3A filesystem.

If the output is not the expected value, you can set this property by adding it to the Spark configuration options field in the "Advanced Options" tab of your cluster configuration.

You can set it to "org.apache.hadoop.fs.s3a.S3AFileSystem" for S3A filesystem.

  1. Check if you have the proper permissions to access S3. Make sure you have created an instance profile for your EC2 instances with the necessary IAM roles and policies for accessing S3. You can attach this instance profile to your cluster in the "AWS Configuration" tab of the cluster configuration.

  2. Check if you have the correct S3 path format for your data. Make sure you have the proper format for your S3 path, which should be "s3a://bucket-name/object-path".

parimalpatil28
New Contributor III

Hello @Kaniz ,

We are still facing same issue.
We have tried your solution for unity catalog.
Also checked from role related config in AWS.

We are using 13.3 LTS, can you please tell us which aws jar should I install and what configuration I add in advance spark config.

Anything more can you please suggest to try out?

 

Thanks,

Parimal

 

 

parimalpatil28
New Contributor III

Hello @Kaniz ,

Thanks for the help.
We have also investigated internally, we have found the root cause of it.

Our products configuration overwriting the Databricks default spark.executor.extraclasspath confs. because of this our clusters was not able to find S3 related jars.
When we compared normal cluster and our product installed cluster, we have noticed this difference.
when we added our conf and databricks default conf. it started working.

11.3 LTS to 13.3 LTS these version were affected.  it was working fine till 10.4 LTS.

Thanks,

Parimal