Unable to copy mutiple files from file:/tmp to dbfs:/tmp

hoopla
New Contributor II

I am downloading multiple files by web scraping and by default they are stored in /tmp

I can copy a single file by providing the filename and path

%fs cp file:/tmp/2020-12-14_listings.csv.gz dbfs:/tmp

but when I try to copy multiple files I get an error

%fs cp file:/tmp/*_listings* dbfs:/tmp

Error

FileNotFoundException: File file:/tmp/_listings does not exist

Hoping someone has seen this before

Deepak_Bhutada
Databricks Employee
Databricks Employee

Wildcards are currently not supported with dbutils. You can move the whole directory:

dbutils.fs.mv("file:/tmp/test", "dbfs:/tmp/test2", recurse=True)

or just a single file:

dbutils.fs.mv("file:/tmp/test/test.csv", "dbfs:/tmp/test2/test2.csv")

Since the wildcards are not allowed, we need to make it work in this way (list the files and then move or copy - slight traditional way)

import os
 
def db_list_files(file_path, file_prefix):
  file_list = [file.path for file in dbutils.fs.ls(file_path) if os.path.basename(file.path).startswith(file_prefix)]
  return file_list
 
files = db_list_files('file:/your/src_dir', 'foobar')
 
for file in files:
  dbutils.fs.cp(file, os.path.join('dbfs:/your/tgt_dir', os.path.basename(file)))

hoopla
New Contributor II
Thanks Deepak
This is what I have suspected.
Hopefully the wild card feature might be available in future
Thanks