From STTM to Databricks Pipelines: Can Metadata Become the Source Code of Data Engineering?

A0s01gy
New Contributor II

I’ve been exploring a metadata-driven approach to data engineering through a project called Data Engineering Copilot.

The idea is to treat Source-to-Target Mapping (STTM) documents as structured metadata rather than static documentation.

Instead of manually translating STTM into Spark SQL, data quality checks, documentation, and pipelines, a Canonical Metadata Model could generate these artifacts automatically.

The workflow looks something like this:

STTM

Canonical Metadata Model

Spark SQL Generation

Data Quality Rules

Documentation

Production Pipelines

I’m curious:

  1. How are teams managing STTM today?
  2. Are you using metadata-driven frameworks?
  3. Has anyone experimented with generating Databricks assets directly from metadata?

Would love to hear how others are approaching this challenge.

Amit Kumar Singh
Lead Data Engineer | AI-Assisted Data Engineering