cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Notebook connectivity issue with aws s3 bucket using mounting

kumarPerry
New Contributor II

When connecting to aws s3 bucket using dbfs, application throws error like

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 7864387.0 failed 4 times, most recent failure: Lost task 0.3 in stage 7864387.0 (TID 17097322) (xx.***.xx.x executor 853): com.databricks.sql.io.FileReadException: Error while reading file

application is importing csv files from aws s3 and it was working for few days. i tried to load the very small file but same issue. even tried to previously imported file and same issue. When i ran the below command and it works that means mounting is active and listing the files in directory:

display(dbutils.fs.ls("/mnt/xxxxx/yyyy"))

sample code snippet:

spark.read.format("csv").option("inferSchema", "true").option("header", "true").option("sep", ",").load(file_location)

3 REPLIES 3

Anonymous
Not applicable

@Amrendra Kumar​ :

The error message you provided suggests that there may be an issue with reading a file from the AWS S3 bucket. It could be due to various reasons such as network connectivity issues or access permission errors.

Here are a few things you could try to troubleshoot the issue:

  1. Check the AWS S3 bucket access permissions: Ensure that the IAM user or role you are using to access the S3 bucket has the necessary permissions to read the files. You can check this by reviewing the permissions policy attached to the IAM user or role.
  2. Check the network connectivity: Check if there is any network connectivity issue between the Databricks cluster and the S3 bucket. You can check this by testing the connectivity using the AWS CLI or by trying to access the bucket from another network.
  3. Try accessing the file directly: Try accessing the file directly using the S3 URI instead of mounting the bucket. You can use the AWS S3 connector provided by Apache Spark to read files from S3.

Here's an example code snippet that shows how to read a CSV file directly from an S3 bucket using Spark:

s3_uri = "s3://<bucket-name>/<path-to-file>"
df = spark.read.format("csv").option("inferSchema", "true").option("header", "true").option("sep", ",").load(s3_uri)

4) Increase the executor memory: If the above steps do not help, you can try increasing the executor memory by setting the spark.executor.memory configuration to a higher value. This will give more memory to the Spark executor and may help in processing large files.

I hope this helps!

thanks Suteja to reply but these didn't help. I have already tried them before. But i have solved the issue by just restarting the cluster.

Anonymous
Not applicable

Hi @Amrendra Kumar​ 

Hope everything is going great.

Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we can help you. 

Cheers!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group