cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to deploy unique workflows that running on production

jeremy98
New Contributor III

Hello, community!

I have a question about deploying workflows in a production environment. Specifically, how can we deploy a group of workflows to production so that they are created only once and cannot be duplicated by others?

Currently, if someone deploys a GitHub repository containing DABs definitions, it creates new workflows that are accessible only to the person who deployed them. However, in a production scenario, workflows should be deployed just once, and no one should be able to create duplicates.

Is there a specific command or configuration in DABs to prevent this issue?

Additionally, is it possible to assign a group of people permissions to start and stop the workflows created in production?

Thanks, as always!

8 REPLIES 8

Walter_C
Databricks Employee
Databricks Employee

Hello Jeremy, many thanks for reaching out, the intention is that new users just triggers the existing workflow instead of creating a new one via DABs correct?

 

Alberto_Umana
Databricks Employee
Databricks Employee

Hi @jeremy98,

You can explore the DABs of run as, you can use the run_as configuration in your DABs. This configuration ensures that the workflows are created only once and cannot be duplicated by others.

https://docs.databricks.com/en/dev-tools/bundles/deployment-modes.html

So, I don't understand if there is a possibility to overwrite the same workflow because could be a mess if someone changes a cluster configuration I want to be sure that there is only one workflow activated with the new configuration. I'm saying those things, because there was deployed in production one workflow but this one was replicated with a new cluster configuration, but should be overwritten the existed one, why it creates a new workflow

 

  prod:
    workspace:
      host: <host_url>
      root_path: /Workspace/Users/${workspace.current_user.userName}/.bundle/${bundle.name}/${bundle.target}
    
    mode: production

    # permissions:
    #  - user_name: ${workspace.current_user.userName}
    #    level: CAN_MANAGE

    run_as:
      service_principal_name: <sp_id>

    sync:
      exclude: 
        - ./notebook/stg/*.*

    resources:
      jobs:
        sync_delta_and_db:
          name: sync_delta_and_db_${bundle.target}

          schedule: # runs the job every day at 3AM
            quartz_cron_expression: "0 0 3 * * ?"
            timezone_id: "UTC"

          tasks:
            - task_key: sync_delta_${bundle.target}
              job_cluster_key: sync_delta_${bundle.target}_cluster
              notebook_task:
                notebook_path: ./notebook/${bundle.target}/db_sync_initial_wip.ipynb
                source: WORKSPACE
              libraries:
                - whl: ${workspace.root_path}/files/dist/<lib>-0.0.1-py3-none-any.whl

          job_clusters: # TODO: this needs to be resized once we understand how to handle massive data properly
            - job_cluster_key: sync_delta_${bundle.target}_cluster
              new_cluster:
                spark_version: 15.4.x-scala2.12
                node_type_id: Standard_DS3_v2
                runtime_engine: PHOTON
                num_workers: 0
                spark_conf:
                  spark.databricks.cluster.profile: singleNode
                  spark.master: local[*]
                custom_tags:
                  ResourceClass: SingleNode

 

Alberto_Umana
Databricks Employee
Databricks Employee

About your second question. You can use the UI to add Can_Manage permission on workflow job to a group.

https://docs.databricks.com/en/jobs/privileges.html

https://kb.databricks.com/en_US/security/bulk-update-workflow-permissions-for-a-group

jeremy98
New Contributor III

Thanks guys, for those guys I'm going to try them!

Walter_C
Databricks Employee
Databricks Employee

Does the name of the workflow remained the same? or the job name was changed? If the same exact name does it shows the duplicate name in the UI?

jeremy98
New Contributor III

Hi Walter, the name of the workflow is the same. The only thing that I changed is the compute configuration that I changed to PHOTON configuration without using it. Also the creator of the workflow, that in this case the first creation was made by my colleague, instead of the new one that I created thinking that overwritten the existed one instead isn't in this way.. how to solve this problem? Is it possible to have only one workflow :(?

jeremy98
New Contributor III

 I had this night another issue:

run failed with error message Unable to access the notebook "/Workspace/Users/<user email>/.bundle/rnc_data_pipelines/prod/files/notebook/prod/db_sync_initial_wip". Either it does not exist, or the identity used to run this job, sp-prod-databricks (<id of sp>), lacks the required permissions.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group