cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Spark verison errors in "Build an ETL pipeline with Lakeflow Spark Declarative Pipelines"

liquibricks
Contributor

I'm trying to define a job for a pipeline using the Asset Bundle Python SDK. I created the pipeline first (using the SDK) and i'm now trying to add the Job. The DAB validates and deploys successfully, but when I run the Job i get an error:

 

UNAUTHORIZED_ERROR: User <some-guid> does not have Run permissions on pipeline None.
 
How can I define the job to link to the already existing pipeline (which is already running in Continuous mode)?
 
The DAB code is as follows:
 
my_pipeline = Pipeline(
    name = "My Pipeline",
    catalog = "mycatalog",
    schema = "default",
    continuous=True,
    clusters = [
        PipelineCluster(
            ...
        )
    ],
    libraries = [
        PipelineLibrary(
            file=FileLibrary(path="src/my_sdp.py")
        )
    ]
)

my_task = Task(
    task_key="My_pipeline_task",
    pipeline_task=PipelineTask(
        pipeline_id=str(my_pipeline.id)
    )
)

my_job = Job(
    name="My Pipeline Job",
    tasks=[
        my_task
    ]
)
1 ACCEPTED SOLUTION

Accepted Solutions

ethanop
New Contributor III

The error happens because my_pipeline.id does not exist when the Asset Bundle is defined. Resource IDs are only created after deployment, so your job is effectively created with pipeline_id = None. When the job runs, Databricks tries to run a pipeline with ID None, which results in the “Run permissions on pipeline None” error.

In Databricks Asset Bundles, you must link resources symbolically, not by accessing their IDs directly in Python.

To fix this, reference the pipeline using the bundle resource reference syntax:

 

my_task = Task(
    task_key="My_pipeline_task",
    pipeline_task=PipelineTask(
        pipeline_id="${resources.pipelines.my_pipeline.id}"
    )
)

 

Here, my_pipeline is the Python variable name used when defining the Pipeline resource. Databricks resolves this reference to the actual pipeline ID at deploy time.

Your job definition can then remain unchanged.

One important note: because your pipeline is running in continuous mode, triggering it from a job will restart the pipeline each time the job runs. If you don’t need scheduled restarts or orchestration with other tasks, you may not need a job at all, just deploying the pipeline is sufficient.

Key takeaway: never use .id directly in Asset Bundle code. Always use ${resources.<type>.<name>.id} to link bundle-managed resources.

View solution in original post

3 REPLIES 3

emma_s
Databricks Employee
Databricks Employee

Hi, if you've already created the pipeline you don't need to create it again via the DAB, just get the pipeline id from the UI and pass that into your job. Also your syntax for the task and the job should be more like this:

  jobs:
    my_pipeline_job:
      name: my-pipeline-job
      tasks:
        - task_key: my-pipeline-task
          pipeline_task:
            pipeline_id: [pass your the pipeline of existing pipeline here]

ethanop
New Contributor III

The error happens because my_pipeline.id does not exist when the Asset Bundle is defined. Resource IDs are only created after deployment, so your job is effectively created with pipeline_id = None. When the job runs, Databricks tries to run a pipeline with ID None, which results in the “Run permissions on pipeline None” error.

In Databricks Asset Bundles, you must link resources symbolically, not by accessing their IDs directly in Python.

To fix this, reference the pipeline using the bundle resource reference syntax:

 

my_task = Task(
    task_key="My_pipeline_task",
    pipeline_task=PipelineTask(
        pipeline_id="${resources.pipelines.my_pipeline.id}"
    )
)

 

Here, my_pipeline is the Python variable name used when defining the Pipeline resource. Databricks resolves this reference to the actual pipeline ID at deploy time.

Your job definition can then remain unchanged.

One important note: because your pipeline is running in continuous mode, triggering it from a job will restart the pipeline each time the job runs. If you don’t need scheduled restarts or orchestration with other tasks, you may not need a job at all, just deploying the pipeline is sufficient.

Key takeaway: never use .id directly in Asset Bundle code. Always use ${resources.<type>.<name>.id} to link bundle-managed resources.

mukul1409
New Contributor

This happens because the job is not actually linked to the deployed pipeline and the pipeline id is None at runtime. When using Asset Bundles, the pipeline id is only resolved after deployment, so referencing my_pipeline.id in code does not work. Instead, the job must reference the pipeline using the bundle resource reference, not a Python variable. You should define the pipeline and job as bundle resources and set the pipeline task pipeline id to the bundle reference for that pipeline. Also ensure that the job owner has Run permission on the pipeline. Once the job correctly references the deployed pipeline resource and permissions are in place, the unauthorized error will be resolved.

Mukul Chauhan

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now