I don't have Upsert/Merge use cases. Should I use Delta or can I use Parquet?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-23-2021 08:31 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-23-2021 09:47 AM
I would recommend using Delta. Delta stores data as parquet files so you still get a lot of the benefits of parquet with Delta. Even though you don't need to merge data, I would assume you will still want to take advantage of the update/delete functionality of delta. Plus delta will offer better optimization techniques to ensure that your data can be queries efficiently (file pruning, z ordering etc.).
The conversion between delta and parquet is easy so you can always test out both and see which you prefer.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-23-2021 11:58 AM
Delta has significant value beyond the DML/ACID capabilities. Delta's data organization strategies that @Ryan Chynoweth mentions also offer an advantage even for read-only use cases for querying and joining the data. Delta also supports in-place conversion from Parquet. See this for details - https://docs.databricks.com/spark/latest/spark-sql/language-manual/delta-convert-to-delta.html

