cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

delta table storage

Braxx
Contributor II

I couldn't find it clearly explained anywhere, so hope sb here shed some light on that.

Few questions:

1) Where does delta tables are stored?

Docs say: "Delta Lake uses versioned Parquet files to store your data in your cloud storage"

So where exactly is it stored? Could it be stored on any storage I use for instance blob storage or is it somewhere on DBFS or databricks cluster?

2) If I have already data saved to parquet on my Azure blob storage and want to convert them to delta, would this change be implemented on blob? Or maybe, this data will be copy somewhere else and saved as delta in that new location only?

TIA

B

1 ACCEPTED SOLUTION

Accepted Solutions

-werners-
Esteemed Contributor III

If you do not define any storage yourself, data is stored as managed tables, meaning in the blob storage of the databricks subscription (which resides on the cloud provider you use).

If you use your own blob storage/data lake, you can (don't have to but you can) write your data there, as unmanaged tables.

But basically you can store it anywhere you want in the cloud, as long as databricks can access it.

DBFS is a semantic layer on top of actual storage, to make working with files more easy.

So if you mounted 3 blob storage f.e., you can write to any of these 3.

Converting to delta:

https://docs.microsoft.com/en-us/azure/databricks/spark/latest/spark-sql/language-manual/delta-conve....

But you could also choose to write to another location so data is copied and saved in delta lake format.

View solution in original post

2 REPLIES 2

-werners-
Esteemed Contributor III

If you do not define any storage yourself, data is stored as managed tables, meaning in the blob storage of the databricks subscription (which resides on the cloud provider you use).

If you use your own blob storage/data lake, you can (don't have to but you can) write your data there, as unmanaged tables.

But basically you can store it anywhere you want in the cloud, as long as databricks can access it.

DBFS is a semantic layer on top of actual storage, to make working with files more easy.

So if you mounted 3 blob storage f.e., you can write to any of these 3.

Converting to delta:

https://docs.microsoft.com/en-us/azure/databricks/spark/latest/spark-sql/language-manual/delta-conve....

But you could also choose to write to another location so data is copied and saved in delta lake format.

Braxx
Contributor II

thanks, very helpful

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group