I’ve been exploring a metadata-driven approach to data engineering through a project called Data Engineering Copilot.
The idea is to treat Source-to-Target Mapping (STTM) documents as structured metadata rather than static documentation.
Instead of manually translating STTM into Spark SQL, data quality checks, documentation, and pipelines, a Canonical Metadata Model could generate these artifacts automatically.
The workflow looks something like this:
STTM
↓
Canonical Metadata Model
↓
Spark SQL Generation
↓
Data Quality Rules
↓
Documentation
↓
Production Pipelines
I’m curious:
- How are teams managing STTM today?
- Are you using metadata-driven frameworks?
- Has anyone experimented with generating Databricks assets directly from metadata?
Would love to hear how others are approaching this challenge.
Amit Kumar Singh
Lead Data Engineer | AI-Assisted Data Engineering