- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-12-2019 03:07 PM
reading data form url using spark ,community edition ,got a path related error ,any suggestions please ?
url = "https://raw.githubusercontent.com/thomaspernet/data_csv_r/master/data/adult.csv"
from pyspark import SparkFiles
spark.sparkContext.addFile(url)
# sc.addFile(url)
# sqlContext = SQLContext(sc)
# df = sqlContext.read.csv(SparkFiles.get("adult.csv"), header=True, inferSchema= True)
df = spark.read.csv(SparkFiles.get("adult.csv"), header=True, inferSchema= True)
error:
Path does not exist: dbfs:/local_disk0/spark-9f23ed57-133e-41d5-91b2-12555d641961/userFiles-d252b3ba-499c-42c9-be48-96358357fb75/adult.csv
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-02-2021 12:55 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-16-2019 01:21 AM
Hi @rr_5454,
You will find the answer here https://forums.databricks.com/questions/10648/upload-local-files-into-dbfs-1.html
You will have to:
- get the file to local file storage
- move the file from dbfs
- load the file in a dataframe
This is one of the possible solutions.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-08-2019 05:15 PM
I face the same issue, could you provide some code for assistance? thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-28-2021 12:31 PM
With code for anyone facing the same issue, and without moving to a different path
import requests
CHUNK_SIZE=4096
with requests.get("https://raw.githubusercontent.com/suy1968/Adult.csv-Dataset/main/adult.csv", stream=True) as resp:
if resp.ok:
with open("/dbfs/FileStore/data/adult.csv", "wb") as f:
for chunk in resp.iter_content(chunk_size=CHUNK_SIZE):
f.write(chunk)
display(spark.read.csv("dbfs:/FileStore/data/adult.csv", header=True, inferSchema=True))
I had to use a different URL as the one in the original question was no longer available
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-09-2023 03:11 AM
Hi there,
I have pretty much the exact code you have here, and yet it still doesnt work, saying "No such file or directory"
Is this a limitation of the community edition?
import requests
CHUNK_SIZE=4096
def get_remote_file(dataSrcUrl, destFile):
'''Simple old skool python function to load a remote url into local hdfs '''
destFile = "/dbfs" + destFile
#
with requests.get(dataSrcUrl, stream=True) as resp:
if resp.ok:
with open(destFile, "wb") as f:
for chunk in resp.iter_content(chunk_size=CHUNK_SIZE):
f.write(chunk)
get_remote_file("https://gitlab.com/opstar/share20/-/raw/master/university.json", "/Filestore/data/lgdt/university.json" )
The directory "dbfs:/Filestore/data/lgdt" definitely exists as i can see it when running the dbutils.fs.ls(path) command
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-29-2021 04:00 AM
Hi,
I face the same issue as abose with the following error:
Path does not exist: dbfs:/local_disk0/spark-9f23ed57-133e-41d5-91b2-12555d641961/userFiles-d252b3ba-499c-42c9-be48-96358357fb75/adult.csv
unfortunatly this link is dead: https://forums.databricks.com/questions/10648/upload-local-files-into-dbfs-1.html
Would it be possible to give the solution again ?
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-29-2021 01:44 PM
@Bertrand BURCKER - Try here - https://web.archive.org/web/20201030194155/https://forums.databricks.com/questions/10648/upload-loca...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-02-2021 12:55 AM

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-02-2021 08:44 AM
@Bertrand BURCKER - That's great! Would you be happy to mark your answer as best so that others can find it easily?
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-26-2021 06:14 AM
Hi ,
We can also read CSV directly without writing it to DBFS.
Scala spark Approach
import org.apache.commons.io.IOUtils // jar will be already there in spark cluster no need to worry
import java.net.URL
val urlfile=new URL("https://people.sc.fsu.edu/~jburkardt/data/csv/airtravel.csv")
val testDummyCSV = IOUtils.toString(urlfile,"UTF-8").lines.toList.toDS()
val testcsv = spark
.read.option("header", true)
.option("inferSchema", true)
.csv(testDummyCSV)
display(testcsv)
Notebook attached
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-13-2021 09:48 AM
hello everyone, this issue has not been resolved until today. I appreciate all the palliative ways. But shouldn't SparkFiles be able to extract data from an API? I tested SparkFiles on Community Databricks without errors, but on Azure it generates the path not found message.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-14-2021 12:04 AM
hi,
does the best answer of this post help you :
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-14-2021 03:56 AM
Hi, the concept of functional sparkfiles I already know, functionality within Azure that is not correct.
The discussion is here:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-01-2023 01:10 PM
Sorry, bringing this back up...
from pyspark import SparkFiles
url = "http://raw.githubusercontent.com/ltregan/ds-data/main/authors.csv"
spark.sparkContext.addFile(url)
df = spark.read.csv("file://"+SparkFiles.get("authors.csv"), header=True, inferSchema= True)
df.show()
I get this empty output:
++
||
++
++
Any idea ? Spark 3.2.2 on Mac M1

