- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-27-2022 02:32 AM
I couldn't find it clearly explained anywhere, so hope sb here shed some light on that.
Few questions:
1) Where does delta tables are stored?
Docs say: "Delta Lake uses versioned Parquet files to store your data in your cloud storage"
So where exactly is it stored? Could it be stored on any storage I use for instance blob storage or is it somewhere on DBFS or databricks cluster?
2) If I have already data saved to parquet on my Azure blob storage and want to convert them to delta, would this change be implemented on blob? Or maybe, this data will be copy somewhere else and saved as delta in that new location only?
TIA
B
- Labels:
-
Delt Lake
-
Delta Tables
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-30-2022 12:55 AM
If you do not define any storage yourself, data is stored as managed tables, meaning in the blob storage of the databricks subscription (which resides on the cloud provider you use).
If you use your own blob storage/data lake, you can (don't have to but you can) write your data there, as unmanaged tables.
But basically you can store it anywhere you want in the cloud, as long as databricks can access it.
DBFS is a semantic layer on top of actual storage, to make working with files more easy.
So if you mounted 3 blob storage f.e., you can write to any of these 3.
Converting to delta:
But you could also choose to write to another location so data is copied and saved in delta lake format.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-30-2022 12:55 AM
If you do not define any storage yourself, data is stored as managed tables, meaning in the blob storage of the databricks subscription (which resides on the cloud provider you use).
If you use your own blob storage/data lake, you can (don't have to but you can) write your data there, as unmanaged tables.
But basically you can store it anywhere you want in the cloud, as long as databricks can access it.
DBFS is a semantic layer on top of actual storage, to make working with files more easy.
So if you mounted 3 blob storage f.e., you can write to any of these 3.
Converting to delta:
But you could also choose to write to another location so data is copied and saved in delta lake format.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-30-2022 08:01 AM
thanks, very helpful

