cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Not able to read text file from local file path - Spark CSV reader

SankaraiahNaray
New Contributor II

We are using Spark CSV reader to read the csv file to convert as DataFrame and we are running the job on

yarn-client
, its working fine in local mode.

We are submitting the spark job in

edge node
.

But when we place the file in local file path instead of HDFS, we are getting file not found exception.

Code:

sqlContext.read.format("com.databricks.spark.csv")
      .option("header", "true").option("inferSchema", "true")
      .load("file:/filepath/file.csv")

We also tried

file:///
, but still we are getting the same error.

Error log:

2016-12-24 16:05:40,044 WARN  [task-result-getter-0] scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, hklvadcnc06.hk.standardchartered.com): java.io.FileNotFoundException: File file:/shared/sample1.csv does not exist
        at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:609)
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:822)
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:599)
        at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
        at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:140)
        at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:341)
        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:767)
        at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:109)
        at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
        at org.apache.spark.rdd.HadoopRDD$anon$1.<init>(HadoopRDD.scala:241)
        at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:212)
        at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz
Community Manager
Community Manager

Hi @Sankaraiah Narayanasamy​ ,

Seems like a bug in spark-shell command when reading a local file, But there is a workaround while running spark-submit command just specify in the command.

--conf "spark.authenticate=false"

SPARK-23476 for reference.

View solution in original post

10 REPLIES 10

VenkatKrishnan
New Contributor II

The path should be with file:/// and it works for me. @

snsancar

Not sure if this got resolved for you or not. If not let me know know so that i can share my code.

Hi, Please share your code to help me resolve the above issue as I am facing the same issue mentioned.

It works if a run the code using a Notebook, but if I use a Spark Submit or Python Submit job it doesn't work

kairaadvani
New Contributor II

searching for the related issue for mykfcexperience and looking forward to getting a solution in this website

EricBellet
New Contributor III

I tried in all the possible ways to read the files and I can't. With a Notebook it works, but I need to run a Spark Submit job and in that way it does not work

pdf = pd.read_csv("/databricks/driver/zipFiles/s3Sensor/2017/Tracking_Bounces_20190906.csv.zip/Bounces.csv")
pdf2 = pd.read_csv("file:/databricks/driver/zipFiles/s3Sensor/2017/Tracking_Bounces_20190906.csv.zip/Bounces.csv")
df3 = pd.read_csv("file:///databricks/driver/zipFiles/s3Sensor/2017/Tracking_Bounces_20190906.csv.zip/Bounces.csv")

ajit1
New Contributor II

Does the file exist on executor node?

abhi_1825
New Contributor III

I am also not able to read a csv file from a C:\ drive location. Can anyone help? I get error as path doesnt exist.

Code snippet -

path = 'file:///C:/Users/folder_1/folder_2/folder_3/xyz.csv'

df = spark.read.csv(path)

Tried lots of combinations for above path but no success.

Anonymous
Not applicable

@Abhishek Pathak​ - My name is Piper, and I'm one of the moderators for Databricks. Thank you for posting your question! Let's see what the community has to say; otherwise, we'll circle back around to this.

abhi_1825
New Contributor III

Hi, Thanks for replying. Do we have any update on this? As far as i looked, it seems we cant read a local file directly. Is it the case.

Accordingly, can i connect to ADLS gen2 storage(Azure) while using community edition of Databricks? I am getting an error there as well.

Thank You.

Kaniz
Community Manager
Community Manager

Hi @Sankaraiah Narayanasamy​ ,

Seems like a bug in spark-shell command when reading a local file, But there is a workaround while running spark-submit command just specify in the command.

--conf "spark.authenticate=false"

SPARK-23476 for reference.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.