Unable to copy mutiple files from file:/tmp to dbfs:/tmp
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-17-2021 06:52 PM
I am downloading multiple files by web scraping and by default they are stored in /tmp
I can copy a single file by providing the filename and path
%fs cp file:/tmp/2020-12-14_listings.csv.gz dbfs:/tmp
but when I try to copy multiple files I get an error
%fs cp file:/tmp/*_listings* dbfs:/tmp
Error
FileNotFoundException: File file:/tmp/_listings does not exist
Hoping someone has seen this before- Labels:
-
Copy
-
Dbfs - databricks file system
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-15-2021 05:15 AM
Wildcards are currently not supported with dbutils. You can move the whole directory:
dbutils.fs.mv("file:/tmp/test", "dbfs:/tmp/test2", recurse=True)or just a single file:
dbutils.fs.mv("file:/tmp/test/test.csv", "dbfs:/tmp/test2/test2.csv")Since the wildcards are not allowed, we need to make it work in this way (list the files and then move or copy - slight traditional way)
import os
def db_list_files(file_path, file_prefix):
file_list = [file.path for file in dbutils.fs.ls(file_path) if os.path.basename(file.path).startswith(file_prefix)]
return file_list
files = db_list_files('file:/your/src_dir', 'foobar')
for file in files:
dbutils.fs.cp(file, os.path.join('dbfs:/your/tgt_dir', os.path.basename(file)))
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-16-2021 11:10 AM
This is what I have suspected.
Hopefully the wild card feature might be available in future
Thanks