cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Which of my Lakeflow pipelines are using the same gateway?

pdiamond
Contributor

Does anyone know of a way to see what Lakeflow pipelines are using the same gateway? We have a gateway connected to a SQL Server that serves multiple individual pipelines but I cannot find a way to see what those are. I've tried system tables. Any insight would be appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions

szymon_dybczak
Esteemed Contributor III

Hi @pdiamond,

You can use REST API or databricks cli (which under the hood make REST API calls for you anyway).

Here's an endpoint you're looking for:

Get a pipeline | Pipelines API | REST API reference | Databricks on AWS

Then in payload look for spec object:

szymon_dybczak_0-1771260734067.png

Inside that object you should find another called ingestion_definiton which contains attribute ingestion_gateway_id:

szymon_dybczak_1-1771260834747.png

 

View solution in original post

2 REPLIES 2

szymon_dybczak
Esteemed Contributor III

Hi @pdiamond,

You can use REST API or databricks cli (which under the hood make REST API calls for you anyway).

Here's an endpoint you're looking for:

Get a pipeline | Pipelines API | REST API reference | Databricks on AWS

Then in payload look for spec object:

szymon_dybczak_0-1771260734067.png

Inside that object you should find another called ingestion_definiton which contains attribute ingestion_gateway_id:

szymon_dybczak_1-1771260834747.png

 

pdiamond
Contributor

Thanks @szymon_dybczak - this worked perfectly. This is what I ended up throwing together to see what I was looking for:

from databricks.sdk import WorkspaceClient

url = f"{DATABRICKS_HOST}/api/2.0/pipelines"
headers = wc.config.authenticate()

wc = WorkspaceClient()
DATABRICKS_HOST = wc.config.host

import requests
response = requests.get(url, headers=headers)
payload = response.json()
pipeline_ids = [item.get("pipeline_id") for item in payload.get("statuses", []) if "pipeline_id" in item]

pipeline_payloads = []
for pid in pipeline_ids:
    detail_url = f"{DATABRICKS_HOST}/api/2.0/pipelines/{pid}"
    detail_response = requests.get(detail_url, headers=headers)
    detail_payload = detail_response.json()
    pipeline_payloads.append(detail_payload)

from pyspark.sql import Row

rows = [
    Row(
        pipeline_id=payload.get("pipeline_id"),
        name=payload.get("spec", {}).get("name"),
        ingestion_gateway_id=payload.get("spec", {}).get("ingestion_definition", {}).get("ingestion_gateway_id")
    )
    for payload in pipeline_payloads
]

df_pipelines = spark.createDataFrame(rows)
display(df_pipelines)