Hey @SDN !
My recommendation is to work with three separate workspaces (dev, preprod, prod). While this approach is more complex in terms of infrastructure, it provides better stability and fewer issues in the long run by ensuring clear separation between development and production environments.
Each workspace should have its own dedicated catalog (dev, pre, prod). However, it is recommended to allow read-only access from dev and preprod to the prod environment. This setup enables developers to work with either real production data or non-production data for testing purposes while ensuring that no unintended modifications affect the production environment.
Data Sharing Between Environments
To move data between environments, you can use:
•Deep Clone (DEEP CLONE) → Preserves the Delta table history and metadata.
•CTAS (CREATE TABLE AS SELECT) → Creates a new table but does not retain version history.
Feature Store Management
From what I understand, you want to create a centralized Feature Store (CFS). However, a Feature Store is essentially a set of tables, which you can store in a dedicated schema within the production catalog.
Since Feature Stores are derived from transformed data, they should be built using the Silver/Gold layer of your Medallion architecture. It is also important to decide whether:
1. Features should be extracted directly from Silver/Gold tables, or
2. A separate pipeline should process and store feature data independently.
If a single Feature Store is shared across all environments, ensure that:
•Feature engineering is performed in dev before promoting features to prod.
• Feature Store updates follow a controlled deployment process (CI/CD).
• Read and write permissions are well-defined to prevent dev/preprod from accidentally overwriting production features.
ETL Pipeline Considerations
The ETL pipeline that processes data from raw to gold should run in the production workspace to ensure a single source of truth. This setup prevents inconsistencies and duplication of processing logic across environments.
However, development and testing should be done in dev/preprod, and only tested pipelines should be deployed to prod.
Security and Access Control
To enforce controlled access to production data, consider using:
•Unity Catalog for centralized permission management.
•External Locations for secure data sharing.
Hope that helps 🙂