cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Not able to read text file from local file path - Spark CSV reader

SankaraiahNaray
New Contributor II

We are using Spark CSV reader to read the csv file to convert as DataFrame and we are running the job on

yarn-client
, its working fine in local mode.

We are submitting the spark job in

edge node
.

But when we place the file in local file path instead of HDFS, we are getting file not found exception.

Code:

sqlContext.read.format("com.databricks.spark.csv")
      .option("header", "true").option("inferSchema", "true")
      .load("file:/filepath/file.csv")

We also tried

file:///
, but still we are getting the same error.

Error log:

2016-12-24 16:05:40,044 WARN  [task-result-getter-0] scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, hklvadcnc06.hk.standardchartered.com): java.io.FileNotFoundException: File file:/shared/sample1.csv does not exist
        at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:609)
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:822)
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:599)
        at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
        at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:140)
        at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:341)
        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:767)
        at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:109)
        at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
        at org.apache.spark.rdd.HadoopRDD$anon$1.<init>(HadoopRDD.scala:241)
        at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:212)
        at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz
Community Manager
Community Manager

Hi @Sankaraiah Narayanasamy​ ,

Seems like a bug in spark-shell command when reading a local file, But there is a workaround while running spark-submit command just specify in the command.

--conf "spark.authenticate=false"

SPARK-23476 for reference.

View solution in original post

11 REPLIES 11

VenkatKrishnan
New Contributor II

The path should be with file:/// and it works for me. @

snsancar

Not sure if this got resolved for you or not. If not let me know know so that i can share my code.

Hi, Please share your code to help me resolve the above issue as I am facing the same issue mentioned.

It works if a run the code using a Notebook, but if I use a Spark Submit or Python Submit job it doesn't work

kairaadvani
New Contributor II

searching for the related issue for mykfcexperience and looking forward to getting a solution in this website

EricBellet
New Contributor III

I tried in all the possible ways to read the files and I can't. With a Notebook it works, but I need to run a Spark Submit job and in that way it does not work

pdf = pd.read_csv("/databricks/driver/zipFiles/s3Sensor/2017/Tracking_Bounces_20190906.csv.zip/Bounces.csv")
pdf2 = pd.read_csv("file:/databricks/driver/zipFiles/s3Sensor/2017/Tracking_Bounces_20190906.csv.zip/Bounces.csv")
df3 = pd.read_csv("file:///databricks/driver/zipFiles/s3Sensor/2017/Tracking_Bounces_20190906.csv.zip/Bounces.csv")

ajit1
New Contributor II

Does the file exist on executor node?

abhi_1825
New Contributor III

I am also not able to read a csv file from a C:\ drive location. Can anyone help? I get error as path doesnt exist.

Code snippet -

path = 'file:///C:/Users/folder_1/folder_2/folder_3/xyz.csv'

df = spark.read.csv(path)

Tried lots of combinations for above path but no success.

Anonymous
Not applicable

@Abhishek Pathak​ - My name is Piper, and I'm one of the moderators for Databricks. Thank you for posting your question! Let's see what the community has to say; otherwise, we'll circle back around to this.

abhi_1825
New Contributor III

Hi, Thanks for replying. Do we have any update on this? As far as i looked, it seems we cant read a local file directly. Is it the case.

Accordingly, can i connect to ADLS gen2 storage(Azure) while using community edition of Databricks? I am getting an error there as well.

Thank You.

Kaniz
Community Manager
Community Manager

Hi @Sankaraiah Narayanasamy​ ,

Seems like a bug in spark-shell command when reading a local file, But there is a workaround while running spark-submit command just specify in the command.

--conf "spark.authenticate=false"

SPARK-23476 for reference.

AshleeBall
New Contributor II

Thanks for your help. It helped me a lot.

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!