cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

PipelineSpec object does not seem to show event_log when defining a pipeline with DAB

RikL
New Contributor II

Hi all, I am looking for help on a very specific subject.

I am trying to access the event_log property (EventLogSpec) of an object from PipelineSpec that I get by running a query on the Workspace Client, which is part of the Databricks Python sdk:

w.pipelines.get(pipeline_id=created.pipeline_id).spec

When I run this inside a DLT pipeline notebook, however, I cannot find the event_log property, even though I have defined the following in the Asset Bundle pipeline definition

event_log:
        catalog: catalog_name
        schema: schema_name
        name:  event_log_name

Is this a known issue, or am I potentially doing something wrong? I do see that the event log is created at the specified location.Thanks in advance!


https://databricks-sdk-py.readthedocs.io/en/latest/dbdataclasses/pipelines.html#databricks.sdk.servi...


Rik

1 REPLY 1

mmayorga
Databricks Employee
Databricks Employee

hi @RikL 

Thank you for reaching out.

It doesn't seem you are doing anything wrong. Per the documentation, indeed, the event_log spec should be retrieved when you run the pipeline.get, then the spec.

I was able to test and confirm the correct behavior with the following basic code, using SDK version 0.49 with Serverless. The latest version is 0.65 (per "%pip index versions databricks-sdk").

from databricks.sdk import WorkspaceClient

# Initialize client with host and token authentication
w = WorkspaceClient()

# The pipeline ID you want to fetch
pipeline_id = # [your_pipeline_id]

# Get pipeline details
pipeline = w.pipelines.get(pipeline_id=pipeline_id)

# Print some key information
print(f"Pipeline Details: {pipeline}")
print(f"Pipeline Event Log Spec: {pipeline.spec.event_log}")

Here is the result of the Event Log Spec:

mmayorga_0-1758244732370.png

Here a couple of troubleshooting questions:

  • From the SDK perspective:
    • Are you using a Classic Cluster? Have you tried to reinstall/update your SDK?
    • Have you tried with Serverless?
  • From the DAB perspective:
    • Iโ€™m wondering if the DAB is missing a registration between the Pipeline and the Event Log, which might explain why it's not being retrieved when you run "pipeline get" with the SDK. Have you tried reconfiguring the pipeline in the UI, running it again, and then checking the results with the SDK?
  • Workarounds
    • Option 1: Remove from your DAB the event_log configuration. This will cause the pipeline to create a hidden table within your default catalog and schema. Then leverage the "event_log" function using the pipeline ID. More details here 
      • SELECT * FROM event_log(<pipelineId>);
        
        
    • Option 2: Try to query the event log directly on your already known location "select * from catalog.schema.event_log_table_name

I hope this helps!