cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

delta table storage

Braxx
Contributor II

I couldn't find it clearly explained anywhere, so hope sb here shed some light on that.

Few questions:

1) Where does delta tables are stored?

Docs say: "Delta Lake uses versioned Parquet files to store your data in your cloud storage"

So where exactly is it stored? Could it be stored on any storage I use for instance blob storage or is it somewhere on DBFS or databricks cluster?

2) If I have already data saved to parquet on my Azure blob storage and want to convert them to delta, would this change be implemented on blob? Or maybe, this data will be copy somewhere else and saved as delta in that new location only?

TIA

B

1 ACCEPTED SOLUTION

Accepted Solutions

-werners-
Esteemed Contributor III

If you do not define any storage yourself, data is stored as managed tables, meaning in the blob storage of the databricks subscription (which resides on the cloud provider you use).

If you use your own blob storage/data lake, you can (don't have to but you can) write your data there, as unmanaged tables.

But basically you can store it anywhere you want in the cloud, as long as databricks can access it.

DBFS is a semantic layer on top of actual storage, to make working with files more easy.

So if you mounted 3 blob storage f.e., you can write to any of these 3.

Converting to delta:

https://docs.microsoft.com/en-us/azure/databricks/spark/latest/spark-sql/language-manual/delta-conve....

But you could also choose to write to another location so data is copied and saved in delta lake format.

View solution in original post

2 REPLIES 2

-werners-
Esteemed Contributor III

If you do not define any storage yourself, data is stored as managed tables, meaning in the blob storage of the databricks subscription (which resides on the cloud provider you use).

If you use your own blob storage/data lake, you can (don't have to but you can) write your data there, as unmanaged tables.

But basically you can store it anywhere you want in the cloud, as long as databricks can access it.

DBFS is a semantic layer on top of actual storage, to make working with files more easy.

So if you mounted 3 blob storage f.e., you can write to any of these 3.

Converting to delta:

https://docs.microsoft.com/en-us/azure/databricks/spark/latest/spark-sql/language-manual/delta-conve....

But you could also choose to write to another location so data is copied and saved in delta lake format.

Braxx
Contributor II

thanks, very helpful

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.