Re: From STTM to Databricks Pipelines: Can Metadat...

A0s01gy · Sunday

I’ve been exploring a metadata-driven approach to data engineering through a project called Data Engineering Copilot.

The idea is to treat Source-to-Target Mapping (STTM) documents as structured metadata rather than static documentation.

Instead of manually translating STTM into Spark SQL, data quality checks, documentation, and pipelines, a Canonical Metadata Model could generate these artifacts automatically.

The workflow looks something like this:

STTM
↓
Canonical Metadata Model
↓
Spark SQL Generation
↓
Data Quality Rules
↓
Documentation
↓
Production Pipelines

I’m curious:

How are teams managing STTM today?
Are you using metadata-driven frameworks?
Has anyone experimented with generating Databricks assets directly from metadata?

Would love to hear how others are approaching this challenge.

Amit Kumar Singh
Lead Data Engineer | AI-Assisted Data Engineering

rdokala · Tuesday

This is a good discussion topic, but from my experience right now it is both meta data driven and most traditional excel based STMs.

A few observations:

How most teams manage STTM today

Level 1 (Most Common)

STTM in Excel, Word, or Confluence.
Engineers manually translate mappings into Spark SQL, dbt, Informatica, ADF, etc.
Documentation becomes stale quickly.
Data quality rules are implemented separately from mappings.

Level 2 (Maturing Teams)

STTM stored in structured tables.
Reusable ETL framework reads metadata for:
- Source tables
- Target tables
- Incremental logic
- Column mappings
- Audit columns
Pipeline orchestration becomes metadata-driven.
Still, transformation logic is often manually coded.

Level 3 (Advanced Teams)

Metadata repository acts as the single source of truth.
Code generation produces:
- SQL
- ETL pipelines
- DQ rules
- Documentation
- Lineage
Human review before deployment.

A0s01gy · Wednesday

Great breakdown. In my experience, many organizations are currently somewhere between Level 1 and Level 2.

One possible next step could be:

Level 4 – AI-Assisted Metadata Engineering

Business Requirements
↓
STTM
↓
Canonical Metadata Model
↓
AI Validation
↓
SQL
PySpark
DQ Rules
Documentation
Lineage
Knowledge Discovery

The interesting shift is that metadata becomes the primary development artifact. Instead of engineers manually translating specifications into code, AI helps validate, enrich, and generate engineering artifacts from a governed metadata model, while humans remain responsible for final outcomes and deployment decisions

Amit Kumar Singh
Lead Data Engineer | AI-Assisted Data Engineering

View solution in original post

From STTM to Databricks Pipelines: Can Metadata Become the Source Code of Data Engineering?

How most teams manage STTM today