Databricks Community

jeremy98 · Thursday

Hello, community!

I have a question about deploying workflows in a production environment. Specifically, how can we deploy a group of workflows to production so that they are created only once and cannot be duplicated by others?

Currently, if someone deploys a GitHub repository containing DABs definitions, it creates new workflows that are accessible only to the person who deployed them. However, in a production scenario, workflows should be deployed just once, and no one should be able to create duplicates.

Is there a specific command or configuration in DABs to prevent this issue?

Additionally, is it possible to assign a group of people permissions to start and stop the workflows created in production?

Thanks, as always!

Walter_C · Thursday

Hello Jeremy, many thanks for reaching out, the intention is that new users just triggers the existing workflow instead of creating a new one via DABs correct?

Alberto_Umana · Thursday

Hi @jeremy98,

You can explore the DABs of run as, you can use the run_as configuration in your DABs. This configuration ensures that the workflows are created only once and cannot be duplicated by others.

https://docs.databricks.com/en/dev-tools/bundles/deployment-modes.html

jeremy98 · yesterday

So, I don't understand if there is a possibility to overwrite the same workflow because could be a mess if someone changes a cluster configuration I want to be sure that there is only one workflow activated with the new configuration. I'm saying those things, because there was deployed in production one workflow but this one was replicated with a new cluster configuration, but should be overwritten the existed one, why it creates a new workflow

  prod:
    workspace:
      host: <host_url>
      root_path: /Workspace/Users/${workspace.current_user.userName}/.bundle/${bundle.name}/${bundle.target}
    
    mode: production

    # permissions:
    #  - user_name: ${workspace.current_user.userName}
    #    level: CAN_MANAGE

    run_as:
      service_principal_name: <sp_id>

    sync:
      exclude: 
        - ./notebook/stg/*.*

    resources:
      jobs:
        sync_delta_and_db:
          name: sync_delta_and_db_${bundle.target}

          schedule: # runs the job every day at 3AM
            quartz_cron_expression: "0 0 3 * * ?"
            timezone_id: "UTC"

          tasks:
            - task_key: sync_delta_${bundle.target}
              job_cluster_key: sync_delta_${bundle.target}_cluster
              notebook_task:
                notebook_path: ./notebook/${bundle.target}/db_sync_initial_wip.ipynb
                source: WORKSPACE
              libraries:
                - whl: ${workspace.root_path}/files/dist/<lib>-0.0.1-py3-none-any.whl

          job_clusters: # TODO: this needs to be resized once we understand how to handle massive data properly
            - job_cluster_key: sync_delta_${bundle.target}_cluster
              new_cluster:
                spark_version: 15.4.x-scala2.12
                node_type_id: Standard_DS3_v2
                runtime_engine: PHOTON
                num_workers: 0
                spark_conf:
                  spark.databricks.cluster.profile: singleNode
                  spark.master: local[*]
                custom_tags:
                  ResourceClass: SingleNode

Alberto_Umana · Thursday

About your second question. You can use the UI to add Can_Manage permission on workflow job to a group.

https://docs.databricks.com/en/jobs/privileges.html

https://kb.databricks.com/en_US/security/bulk-update-workflow-permissions-for-a-group

jeremy98 · yesterday

Thanks guys, for those guys I'm going to try them!

Walter_C · yesterday

Does the name of the workflow remained the same? or the job name was changed? If the same exact name does it shows the duplicate name in the UI?

jeremy98 · yesterday

Hi Walter, the name of the workflow is the same. The only thing that I changed is the compute configuration that I changed to PHOTON configuration without using it. Also the creator of the workflow, that in this case the first creation was made by my colleague, instead of the new one that I created thinking that overwritten the existed one instead isn't in this way.. how to solve this problem? Is it possible to have only one workflow :(?

jeremy98 · 5 hours ago

I had this night another issue:

run failed with error message Unable to access the notebook "/Workspace/Users/<user email>/.bundle/rnc_data_pipelines/prod/files/notebook/prod/db_sync_initial_wip". Either it does not exist, or the identity used to run this job, sp-prod-databricks (<id of sp>), lacks the required permissions.

Databricks Community

How to deploy unique workflows that running on production

Connect with Databricks Users in Your Area

Databricks Learning Festival (Virtual): 15 January - 31 January 2025

Milestone: DatabricksTV Reaches 100 Videos!

Announcing the new Meta Llama 3.3 model on Databricks

Databricks Community Champion - December 2024 - Sujesh Menon

Dotmatics and Databricks Partner to Advance Scientific Intelligence in Life Sciences