Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-03-2024 11:00 PM
Hey @jcozar
Let's address your questions about storing raw data and implementing a medallion architecture.
Storing Raw Data:
- Delta Lake: Storing raw data as Delta Lake files in an external location (e.g., ADLS Gen2) is a good practice for reprocessing due to its schema evolution capabilities.
- Cloudfiles format: While using cloudfiles format within Delta Lake isn't standard, it might be achievable with custom logic. However, consider if it adds unnecessary complexity compared to the benefits of native Delta Lake schema features.
Medallion Architecture Implementation:
Where to store:
- External location: Storing raw, bronze, and silver tables externally is common as it separates data from computing and facilitates data lifecycle management.
- Unity Catalog table: While possible, storing data directly in Unity Catalog tables isn't recommended due to performance and cost implications. Unity Catalog excels at metadata management, not data storage.
Workflow Options:
- Standard workflow: Databricks offers standard Delta Lake pipelines for operations, including bronze and silver table creation. These pipelines leverage built-in Delta Lake features and simplify development.
- DLT pipeline: You can create a custom DLT pipeline using Spark. Which can help you in monitoring and data sharing between the layers while loading it. DLT also gives you the capability of waiting and loading if your tables have dependencies on each other.
Addressing Your Approach:
- Your approach to storing raw data in Delta Lake format is commendable. It allows for schema evolution and potential reprocessing.
- Consider whether the cloud files format adds real value compared to Delta Lake schema features.
- Storing tables in external locations is generally recommended.
- Leveraging Databricks' standard DLT pipelines can save development time and effort.
Leave a like if this helps, followups are appreciated.
Leave a like if this helps! Kudos,
Palash
Palash