Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-06-2021 09:29 AM
When you use:
from pyspark import SparkFiles
spark.sparkContext.addFile(url)it adds file to NON dbfs /local_disk0/ but then when you want to read file:
spark.read.json(SparkFiles.get("file_name"))it wants to read it from /dbfs/local_disk0/. I tried also with file:// and many other creative ways and it doesn't work.
Of course it is working after using %sh cp - moving from /local_disk0/ to /dbfs/local_disk0/ .
It seems to be a bug like addFile was switched to dbfs on azure databricks but SparkFiles not (in original spark it addFile and gets to/from workers).
I couldn't find also any settings to manually specify RootDirectory for SparkFiles.
My blog: https://databrickster.medium.com/
Labels:
- Labels:
-
Azure
-
Azure databricks