cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Where / how does DBFS store files?

al_joe
Contributor

I tried to use %fs head to print the contents of a CSV file used in a training

%fs head "/mnt/path/file.csv"

but got an error saying cannot head a directory!?

Then I did %fs ls on the same CSV file and got a list of 4 files under a directory named as a CSV file.

screenshot 

A little confused how DBFS stores files on underlying storage and where they are stored. I am using community edition.

any pointers appreciated. Thanks.

EDIT: Apache iceberg has a illustration that shows the internal logical/storage structure well. Is there something similar for DBFS? 

image 

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz
Community Manager
Community Manager

Hi @Al Jo​ , Databricks File System (DBFS) is a distributed file system mounted into a Databricks workspace and available on Databricks clusters. DBFS is an abstraction on top of scalable object storage and offers the following benefits:

  • Allows you to mount storage objects so that you can seamlessly access data without requiring credentials.
  • Allows you to interact with object storage using directory and file semantics instead of storage URLs.
  • Persists files to object storage, so you won’t lose data after you terminate a cluster.

Please read more about DBS here.

View solution in original post

3 REPLIES 3

Kaniz
Community Manager
Community Manager

Hi @Al Jo​ , Databricks File System (DBFS) is a distributed file system mounted into a Databricks workspace and available on Databricks clusters. DBFS is an abstraction on top of scalable object storage and offers the following benefits:

  • Allows you to mount storage objects so that you can seamlessly access data without requiring credentials.
  • Allows you to interact with object storage using directory and file semantics instead of storage URLs.
  • Persists files to object storage, so you won’t lose data after you terminate a cluster.

Please read more about DBS here.

Chris_Shehu
Valued Contributor III

Is this a fresh cluster startup? I noticed that if you browse tot he location using the UI you get a prompt asking you to attach a cluster on the first startup. I would make sure that's setup and make sure you can see it in the directory.

image 

Also have you tried loading the csv into pandas to display it?

User16753725182
Contributor III
Contributor III

Hi @Al Jo​ , are you still seeing the error while printing the contents of te CSV file?

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.