Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-27-2021 12:47 AM
for what's it worth:
we do all our data processing in databricks and finally copy our curated data to a dwh (for historic reasons) where most of our BI runs on.
This is in my opinion an anti-pattern as reporting directly on our data lake (delta lake + parquet) eliminates the data copy. You gain time (no more copy), less maintenance and a less complex architecture.
Of course you will have to assess if your BI tool is able to consume delta lake, parquet. Or use the SQL endpoints of Databricks (or some other SQL engine).
I want to get rid of our data warehouse as soon as I can.