<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: From STTM to Databricks Pipelines: Can Metadata Become the Source Code of Data Engineering? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/from-sttm-to-databricks-pipelines-can-metadata-become-the-source/m-p/159407#M54809</link>
    <description>&lt;P&gt;&lt;SPAN&gt;Great breakdown. In my experience, many organizations are currently somewhere between Level 1 and Level 2.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;One possible next step could be:&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Level 4 – AI-Assisted Metadata Engineering&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Business Requirements&lt;BR /&gt;↓&lt;BR /&gt;STTM&lt;BR /&gt;↓&lt;BR /&gt;Canonical Metadata Model&lt;BR /&gt;↓&lt;BR /&gt;AI Validation&lt;BR /&gt;↓&lt;BR /&gt;SQL&lt;BR /&gt;PySpark&lt;BR /&gt;DQ Rules&lt;BR /&gt;Documentation&lt;BR /&gt;Lineage&lt;BR /&gt;Knowledge Discovery&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;The interesting shift is that metadata becomes the primary development artifact. Instead of engineers manually translating specifications into code, AI helps validate, enrich, and generate engineering artifacts from a governed metadata model, while humans remain responsible for final outcomes and deployment decisions&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 17 Jun 2026 08:11:21 GMT</pubDate>
    <dc:creator>A0s01gy</dc:creator>
    <dc:date>2026-06-17T08:11:21Z</dc:date>
    <item>
      <title>From STTM to Databricks Pipelines: Can Metadata Become the Source Code of Data Engineering?</title>
      <link>https://community.databricks.com/t5/data-engineering/from-sttm-to-databricks-pipelines-can-metadata-become-the-source/m-p/158984#M54786</link>
      <description>&lt;P&gt;&lt;SPAN&gt;I’ve been exploring a metadata-driven approach to data engineering through a project called Data Engineering Copilot.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;The idea is to treat Source-to-Target Mapping (STTM) documents as structured metadata rather than static documentation.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Instead of manually translating STTM into Spark SQL, data quality checks, documentation, and pipelines, a Canonical Metadata Model could generate these artifacts automatically.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;The workflow looks something like this:&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;STTM&lt;/SPAN&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;↓&lt;/SPAN&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;Canonical Metadata Model&lt;/SPAN&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;↓&lt;/SPAN&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;Spark SQL Generation&lt;/SPAN&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;↓&lt;/SPAN&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;Data Quality Rules&lt;/SPAN&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;↓&lt;/SPAN&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;Documentation&lt;/SPAN&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;↓&lt;/SPAN&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;Production Pipelines&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I’m curious:&lt;/SPAN&gt;&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;&lt;SPAN&gt;How are teams managing STTM today?&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;Are you using metadata-driven frameworks?&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;Has anyone experimented with generating Databricks assets directly from metadata?&lt;/SPAN&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&lt;SPAN&gt;Would love to hear how others are approaching this challenge.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 15 Jun 2026 06:52:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/from-sttm-to-databricks-pipelines-can-metadata-become-the-source/m-p/158984#M54786</guid>
      <dc:creator>A0s01gy</dc:creator>
      <dc:date>2026-06-15T06:52:43Z</dc:date>
    </item>
    <item>
      <title>Re: From STTM to Databricks Pipelines: Can Metadata Become the Source Code of Data Engineering?</title>
      <link>https://community.databricks.com/t5/data-engineering/from-sttm-to-databricks-pipelines-can-metadata-become-the-source/m-p/159262#M54804</link>
      <description>&lt;P&gt;This is a good discussion topic, but from my experience right now it is both meta data driven and most traditional excel based STMs.&lt;/P&gt;&lt;P&gt;A few observations:&lt;/P&gt;&lt;H3&gt;How most teams manage STTM today&lt;/H3&gt;&lt;P&gt;&lt;STRONG&gt;Level 1 (Most Common)&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;STTM in Excel, Word, or Confluence.&lt;/LI&gt;&lt;LI&gt;Engineers manually translate mappings into Spark SQL, dbt, Informatica, ADF, etc.&lt;/LI&gt;&lt;LI&gt;Documentation becomes stale quickly.&lt;/LI&gt;&lt;LI&gt;Data quality rules are implemented separately from mappings.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;STRONG&gt;Level 2 (Maturing Teams)&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;STTM stored in structured tables.&lt;/LI&gt;&lt;LI&gt;Reusable ETL framework reads metadata for:&lt;UL&gt;&lt;LI&gt;Source tables&lt;/LI&gt;&lt;LI&gt;Target tables&lt;/LI&gt;&lt;LI&gt;Incremental logic&lt;/LI&gt;&lt;LI&gt;Column mappings&lt;/LI&gt;&lt;LI&gt;Audit columns&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;Pipeline orchestration becomes metadata-driven.&lt;/LI&gt;&lt;LI&gt;Still, transformation logic is often manually coded.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;STRONG&gt;Level 3 (Advanced Teams)&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Metadata repository acts as the single source of truth.&lt;/LI&gt;&lt;LI&gt;Code generation produces:&lt;UL&gt;&lt;LI&gt;SQL&lt;/LI&gt;&lt;LI&gt;ETL pipelines&lt;/LI&gt;&lt;LI&gt;DQ rules&lt;/LI&gt;&lt;LI&gt;Documentation&lt;/LI&gt;&lt;LI&gt;Lineage&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;Human review before deployment.&lt;/LI&gt;&lt;/UL&gt;</description>
      <pubDate>Tue, 16 Jun 2026 21:17:38 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/from-sttm-to-databricks-pipelines-can-metadata-become-the-source/m-p/159262#M54804</guid>
      <dc:creator>rdokala</dc:creator>
      <dc:date>2026-06-16T21:17:38Z</dc:date>
    </item>
    <item>
      <title>Re: From STTM to Databricks Pipelines: Can Metadata Become the Source Code of Data Engineering?</title>
      <link>https://community.databricks.com/t5/data-engineering/from-sttm-to-databricks-pipelines-can-metadata-become-the-source/m-p/159407#M54809</link>
      <description>&lt;P&gt;&lt;SPAN&gt;Great breakdown. In my experience, many organizations are currently somewhere between Level 1 and Level 2.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;One possible next step could be:&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Level 4 – AI-Assisted Metadata Engineering&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Business Requirements&lt;BR /&gt;↓&lt;BR /&gt;STTM&lt;BR /&gt;↓&lt;BR /&gt;Canonical Metadata Model&lt;BR /&gt;↓&lt;BR /&gt;AI Validation&lt;BR /&gt;↓&lt;BR /&gt;SQL&lt;BR /&gt;PySpark&lt;BR /&gt;DQ Rules&lt;BR /&gt;Documentation&lt;BR /&gt;Lineage&lt;BR /&gt;Knowledge Discovery&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;The interesting shift is that metadata becomes the primary development artifact. Instead of engineers manually translating specifications into code, AI helps validate, enrich, and generate engineering artifacts from a governed metadata model, while humans remain responsible for final outcomes and deployment decisions&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 17 Jun 2026 08:11:21 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/from-sttm-to-databricks-pipelines-can-metadata-become-the-source/m-p/159407#M54809</guid>
      <dc:creator>A0s01gy</dc:creator>
      <dc:date>2026-06-17T08:11:21Z</dc:date>
    </item>
  </channel>
</rss>

