Databricks Community

Kabil · ‎03-17-2025

i have started using DLT pipeline, and i have common code which is used by multiple DLT pipeline. now i need to read metadata information like name of the pipeline and start time of the pipeline during run time, but since im using common code and pipeline names and start time will change for each pipeline, i need to get this information at runtime. how should i get it ?

mark_ott · a week ago

To dynamically access metadata like the pipeline name and start time at runtime in your common code for Delta Live Tables (DLT) pipelines, you should leverage runtime context and built-in metadata features provided by the DLT or related orchestrators. This approach enables your shared code to retrieve values that change per pipeline execution, without hard-coding these properties.

Recommended Approach

For Databricks DLT pipelines, metadata such as pipeline name and start time are typically accessible from environment variables, context objects, or by querying the pipeline's configuration and run history through SDKs or pipeline APIs at runtime.
If you use declarative or metadata-driven DLT setups (such as Lakeflow Declarative Pipelines via dlt-meta), you can centrally manage your pipeline metadata in configuration files (like YAML/JSON), which your code reads during execution.

How to Read Metadata at Runtime

Pipeline Name:
- When creating the pipeline, the pipeline_name argument is typically passed and can be accessed programmatically. If it is omitted, Databricks generates a name based on the current running script, which you can also retrieve via the pipeline API or configuration object.
Start Time:
- The start time is available as part of the run metadata or in progress tracking objects (such as the result returned by pipeline .run() method, which includes load IDs with timestamps).
Common Code Integration:
- Whether working in notebooks or Python modules, you can write a helper function or class to load pipeline context, fetching values such as pipeline_name, start time, etc., from environmental context or via API calls, and pass them through your processing logic.

Example

Here’s a Python-style pseudocode snippet of how you might retrieve pipeline metadata in common code:

python

import dlt

def get_pipeline_metadata(pipeline):
    name = pipeline.pipeline_name  # Or fetch from config/environment if applicable
    start_time = pipeline.last_run['start_time']  # If available; otherwise, fetch from API
    return {"name": name, "start_time": start_time}

# In your common code
pipeline = dlt.pipeline(...)
metadata = get_pipeline_metadata(pipeline)
print(f"Pipeline: {metadata['name']} started at {metadata['start_time']}")

In advanced setups, you may need to parse runtime context or query the DLT APIs for current pipeline state and metadata if not directly attached to the pipeline instance.

Notes

Review the documentation or official SDK for your DLT implementation to confirm the exact runtime context or API calls for metadata, as interfaces can vary based on how pipelines are configured and started.
Centralizing metadata in configuration files (using dlt-meta or similar) makes it easier to manage and access these values across multiple pipelines and executions.

This approach ensures your common code remains generic, picking up the pipeline-specific metadata at runtime without hard-coded values, adapting dynamically for each pipeline execution.