cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

CLI Command <databricks fs cp> Not Uploading Files to DBFS

IgnacioCastinei
New Contributor III

Hi all,

So far I have been successfully using the CLI interface to upload files from my local machine to DBFS/FileStore/tables. Specifically, I have been using my terminal and the following command:

databricks fs cp -r <MyLocalDataset> dbfs:/FileStore/tables/NewDataset/

This last week the command does not seem to work anymore. When executing it verbosely it seems to run successfully (as the copy of each file is displayed in the terminal). Moreover, if later on I trigger the following command the NewDataset folder is listed:

databricks fs ls dbfs:/FileStore/tables/

However, when I check the content on Databricks => Data => Create New Table => DBFS => /FileStore/tables the folder NewDataset is not there.

Moreover, if I create a Notebook and try to load the NewDataset I get the following error:

org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: /FileStore/tables/NewDataset

I have tried other commands of the CLI (as, for example, databricks clusters list) and they all work fine.

I am doing something wrong, or is there a new way for uploading files to DBFS I should be using instead?

I am using Databricks Community Edition.

Thank you very much for your time.

Kind regards,

Nacho

6 REPLIES 6

IgnacioCastinei
New Contributor III

Hi Kaniz,

Thank you so much! Its been 3 weeks since I posted the question. If you can provide me with some guidance I will appreciate it.

Thank you very much in advance!

Kind regards,

Nacho

Anonymous
Not applicable

IgnacioCastineiras 

I tested this in my lab and found it working as expected. 

Can you run the below command and see if it returns any files list? 

databricks fs ls dbfs:/FileStore/tables/NewDataset

Also please check if a file already exists with the same name that of "NewDataset" in dbfs:/FileStore/tables 

IgnacioCastinei
New Contributor III

Hi @Arjun Kaimaparambil Rajan​ ,

Thank you for your answer. Yes, I think this is indeed the case.

I can see a mismatch between the content of the DBFS when:

  1. Using my machine and the CLI (the NewDataset folder is listed when using the command "databricks fs ls dbfs:/FileStore/tables/") .
  2. Using the GUI of Databricks (the NewDataset folder is not listed when using Databricks => Data => Create New Table => DBFS => /FileStore/tables).

In other words, the command "databricks fs cp -r <MyLocalDataset> dbfs:/FileStore/tables/NewDataset/" is uploading the dataset "somewhere", but not to the DBFS I can see with the GUI via Databricks => Data => Create New Table => DBFS => /FileStore/tables.

My question is:

  • What command should I type in order to upload the dataset there?

Thank you very much in advance!

Kind regards,

Nacho

IgnacioCastinei
New Contributor III

PS: I also checked the option --debug and looked for the header: x-databricks-org-id.

Both the ID of the CLI and the one appearing in the GUI are the same.

Hi @Arjun Kaimaparambil Rajan​ ,

Thank you for your reply.

Yes, I can confirm that the GUI option "Data" => "Create Table" => "Upload File" allows me to upload datasets from my local machine to DBFS.

Therefore, this can be used as an alternative to the CLI "databricks fs cp" command for uploading datasets from the local machine to DBSF /FileStore/tables/.

Two questions:

1. Would there be a similar GUI approach to download a result folder produced by a Spark Job back to the local machine?

I am aware that individual files in /FileStore/tables can be accessed via the URL, but this approach doesn't seem to work for an entire folder. Whereas the command "databricks fs ls" can be used to generate a script iterating the download of each file via "wget", that seems to be quite tedious.

2. More in general, would it be possible to add the CLI functionality "databricks fs cp" back to the Databricks Community Edition?

The CLI "databricks fs cp" command has been working all these years until recently. Perhaps it could be considered to bring this functionality back.

Personally, I use Databricks for teaching Spark in my university modules. Both my students and I like Databricks so much, and we would like to continue using it.

Kind regards,

Nacho

jose_gonzalez
Databricks Employee
Databricks Employee

hi @Ignacio Castineiras​ ,

If Arjun.kr's fully answered your question, would you be happy to mark their answer as best so that others can quickly find the solution?

Please let us know if you still are having this issue.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group