Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-21-2021 11:33 AM
If you are attempting to read all the files in a directory you should be able to use a wild card and filter using the extension. For example:
df = (spark
.read
.format("com.crealytics.spark.excel")
.option("header", "True")
.option("inferSchema", "true")
.option("dataAddress", "'Usage Dataset'!A2")
.load('/mnt/adls/40_project/UBC/WIP/Mercury/UUR_PS_raw_temp/*_Usage_Dataset.xlsx')
)Should read all the .xlsx files in that directory.
If you want to read a subset of files then you can loop through, get all the file paths in a python list, and then provide that list when reading. For example:
files = dbutils.fs.ls('/mnt/adls/40_project/UBC/WIP/Mercury/UUR_PS_raw_temp/')
files_list = []
for f in files:
# use an if statement to determine if you want to append the path to the list
files_list.append(f.path)
df = (spark
.read
.format("com.crealytics.spark.excel")
.option("header", "True")
.option("inferSchema", "true")
.option("dataAddress", "'Usage Dataset'!A2")
.load(files_list)
)