cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Difference between DBFS and Delta Lake?

pjp94
Contributor

Would like a deeper dive/explanation into the difference. When I write to a table with the following code:

spark_df.write.mode("overwrite").saveAsTable("db.table")

The table is created and can be viewed in the Data tab. It can also be found in some DBFS path. Now if I run:

dbutils.fs.rm("{}".format(dbfs_path), recurse=True)

Where dbfs_path is a pathway to the table in DBFS, it will remove that table from DBFS, however it is still in the Data tab (even though I know you can't call the table anymore inside the notebook because technically it no longer exists).

If I run:

%sql
DROP TABLE IF EXISTS db.table

Inside a cell, it will drop the table from the Data tab and DBFS. Can someone explain (high level) how the infrastructure works? Much appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions

-werners-
Esteemed Contributor III

Tables in spark, delta lake-backed or not are basically just semantic views on top of the actual data.

On Databricks, the data itself is stored in DBFS, which is an abstraction layer on top of the actual storage (like S3, ADLS etct). this can be parquet, orc, csv, json etc.

So with your rm command you did indeed delete the data from DBFS. However, the table definition still exists (it is stored in a metastore which contains metadata about which databases and tables exist and where the data resides).

So now you have an empty table. To remove the table definition too, you have to drop it, exactly like you did.

For completeness: delta lake has nothing to do with this. Delta lake is parquet on steroids giving you a lot more functionalities, but the way of working stays identical.

View solution in original post

3 REPLIES 3

-werners-
Esteemed Contributor III

Tables in spark, delta lake-backed or not are basically just semantic views on top of the actual data.

On Databricks, the data itself is stored in DBFS, which is an abstraction layer on top of the actual storage (like S3, ADLS etct). this can be parquet, orc, csv, json etc.

So with your rm command you did indeed delete the data from DBFS. However, the table definition still exists (it is stored in a metastore which contains metadata about which databases and tables exist and where the data resides).

So now you have an empty table. To remove the table definition too, you have to drop it, exactly like you did.

For completeness: delta lake has nothing to do with this. Delta lake is parquet on steroids giving you a lot more functionalities, but the way of working stays identical.

Hi @Werner Stinckens​ , this is exactly what I was looking for. Thanks!

1) Follow up questions, do you need to setup an object level storage connection on databricks (ie. to an S3 bucket or Azure Blob)?

2) Any folders in your /mnt path are external object stores (ie S3, Blob Storage, etc.), correct? Everything else is stored in the databricks root? I ask because my organization has 2 folders in the /mnt folder: /mnt/aws & /mnt/delta... not sure if delta refers to delta lake?

3) So delta lake and dbfs are independent of eachother, correct? DBFS is where the data is actually stored (ie if I wrote a table, then the parquet files). How does Delta Lake fit into this?

Thanks so much!

-werners-
Esteemed Contributor III

1) you don´t have to as a databricks workspace has it's own storage, but it certainly is a good idea

2)not all folders in /mnt are external. Only the ones you mounted in there yourself.

3)correct. Delta lake is just a file format like parquet, but with more possibilities.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group