07-21-2025 09:09 PM
DLT Meta is an open-source framework developed by Databricks Labs that enables the automation of bronze and silver data pipelines through metadata configuration rather than manual code development.
At its core, the framework uses a Dataflowspec - a JSON-based specification file that contains all the metadata needed to define source connections, target schemas, data quality rules, and transformation logic.
A high level process flow is depicted below:
How DLT Meta Works: The framework operates through three key components:
This metadata files defines the source details, source format, bronze, silver, and gold table details along with their storage location (catalog & schema).
Example:
{
"tables": [
{
"source_format": "cloudFiles",
"source_details": {
"source_path": "/path/to/source",
"source_schema_path": "/path/to/schema"
},
"target_format": "delta",
"target_details": {
"database": "bronze_db",
"table": "customer_data"
}
}
]
}
This is a separate JSON files that defines quality rules to be applied to the bronze and bronze quarantine tables:
{
"expect_or_drop": {
"no_rescued_data": "_rescued_data IS NULL",
"valid_id": "id IS NOT NULL",
"valid_operation": "operation IN ('APPEND', 'DELETE', 'UPDATE')"
},
"expect_or_quarantine": {
"quarantine_rule": "_rescued_data IS NOT NULL OR id IS NULL OR operation IS NULL"
}
}
Business logic transformations defined as SQL to be applied on bronze to create the silver layer:
[
{
"target_table": "customers_silver",
"select_exp": [
"address",
"email",
"firstname",
"id",
"lastname",
"operation_date",
"operation",
"_rescued_data"
]
},
{
"target_table": "transactions_silver",
"select_exp": [
"id",
"customer_id",
"amount",
"item_count",
"operation_date",
"operation",
"_rescued_data"
]
}
]
Once you have all of the JSONs created, you can deploy these json to create a Spec Table using the onboard data flow spec script in the src folder. I've created a onboarding job to pass the parameters which would be passed to the notebooks via dbutils widgets.
The notebook would look like below:
The parameters passed to the onboarding job is as follows:
Once the onboarding job runs successfully, you'd have bronze, silver, and gold spec tables that your DLT Job would take them as configurations.
The typical process would look like below:
Let's proceed to create the DLT pipeline to execute our medallian flow defined in the onboarding json and stored in the spec tables.
The JSON config to create the DLT pipeline is as follows:
{
"pipeline_type": "WORKSPACE",
"clusters": [
{
"label": "default",
"node_type_id": "Standard_D3_v2",
"driver_node_type_id": "Standard_D3_v2",
"num_workers": 1
}
],
"development": true,
"continuous": false,
"channel": "CURRENT",
"photon": false,
"libraries": [
{
"notebook": {
"path": "path/to/the/dlt_meta_notebook"
}
}
],
"name": "your_dlt_pipeline_name",
"edition": "ADVANCED",
"catalog": "catalog_name",
"configuration": {
"layer": "bronze_silver_gold",
"bronze.dataflowspecTable": "<bronze_spec_table_details>",
"bronze.group": "<dataflow_group_defined_in_the_onboarding>",
"silver.dataflowspecTable": "<silver_spec_table_details>",
"silver.group": "<dataflow_group_defined_in_the_onboarding>",
"gold.dataflowspecTable": "<gold_spec_table_details>",
"gold.group": "<dataflow_group_defined_in_the_onboarding>",
},
"schema": "<schema_name>"
}
The layer - bronze_silver_gold will trigger all the tables available in the 3 layers defined in the spec tables.
The dlt_meta_notebook defined in the source code is shown below:
When you finally start the pipeline, it will request resources from the cloud provider (or Databricks if its serverless) and initiate the DAG for your pipeline.
The DAG for my usecase looks like below, which is a combination of both streaming tables and materialized views:
If you want to check this out your-selves, take a look at the Databricks Labs GitHub link: https://github.com/databrickslabs/dlt-meta
Please let me know if you have any questions. Thank you!
07-22-2025 02:33 AM
Great breakdown of DLT Meta’s architecture and process flow. Thanks for sharing, @RiyazAliM!
07-22-2025 07:19 AM
Thank you @Advika 🙂
07-23-2025 11:38 AM
Great Article Riyaz. keep Sharing more knowledge
07-29-2025 03:38 AM
Thank you @sridharplv
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now