โ12-19-2024 11:48 AM
Hello, community!
I have a question about deploying workflows in a production environment. Specifically, how can we deploy a group of workflows to production so that they are created only once and cannot be duplicated by others?
Currently, if someone deploys a GitHub repository containing DABs definitions, it creates new workflows that are accessible only to the person who deployed them. However, in a production scenario, workflows should be deployed just once, and no one should be able to create duplicates.
Is there a specific command or configuration in DABs to prevent this issue?
Additionally, is it possible to assign a group of people permissions to start and stop the workflows created in production?
Thanks, as always!
โ12-19-2024 12:23 PM
Hello Jeremy, many thanks for reaching out, the intention is that new users just triggers the existing workflow instead of creating a new one via DABs correct?
โ12-19-2024 12:24 PM
Hi @jeremy98,
You can explore the DABs of run as, you can use the run_as
configuration in your DABs. This configuration ensures that the workflows are created only once and cannot be duplicated by others.
โ12-20-2024 09:34 AM - edited โ12-20-2024 09:38 AM
So, I don't understand if there is a possibility to overwrite the same workflow because could be a mess if someone changes a cluster configuration I want to be sure that there is only one workflow activated with the new configuration. I'm saying those things, because there was deployed in production one workflow but this one was replicated with a new cluster configuration, but should be overwritten the existed one, why it creates a new workflow
host: <host_url>
root_path: /Workspace/Users/${workspace.current_user.userName}/.bundle/${bundle.name}/${bundle.target}
mode: production
# permissions:
# - user_name: ${workspace.current_user.userName}
# level: CAN_MANAGE
service_principal_name: <sp_id>
- ./notebook/stg/*.*
name: sync_delta_and_db_${bundle.target}
schedule: # runs the job every day at 3AM
quartz_cron_expression: "0 0 3 * * ?"
timezone_id: "UTC"
- task_key: sync_delta_${bundle.target}
job_cluster_key: sync_delta_${bundle.target}_cluster
notebook_path: ./notebook/${bundle.target}/db_sync_initial_wip.ipynb
- whl: ${workspace.root_path}/files/dist/<lib>-0.0.1-py3-none-any.whl
job_clusters: # TODO: this needs to be resized once we understand how to handle massive data properly
- job_cluster_key: sync_delta_${bundle.target}_cluster
spark_version: 15.4.x-scala2.12
node_type_id: Standard_DS3_v2
runtime_engine: PHOTON
num_workers: 0
spark.databricks.cluster.profile: singleNode
spark.master: local[*]
ResourceClass: SingleNode
โ12-19-2024 12:26 PM
About your second question. You can use the UI to add Can_Manage permission on workflow job to a group.
โ12-20-2024 02:40 AM
Thanks guys, for those guys I'm going to try them!
โ12-20-2024 12:04 PM
Does the name of the workflow remained the same? or the job name was changed? If the same exact name does it shows the duplicate name in the UI?
โ12-20-2024 03:55 PM
Hi Walter, the name of the workflow is the same. The only thing that I changed is the compute configuration that I changed to PHOTON configuration without using it. Also the creator of the workflow, that in this case the first creation was made by my colleague, instead of the new one that I created thinking that overwritten the existed one instead isn't in this way.. how to solve this problem? Is it possible to have only one workflow :(?
โ12-21-2024 01:47 AM
I had this night another issue:
run failed with error message Unable to access the notebook "/Workspace/Users/<user email>/.bundle/rnc_data_pipelines/prod/files/notebook/prod/db_sync_initial_wip". Either it does not exist, or the identity used to run this job, sp-prod-databricks (<id of sp>), lacks the required permissions.
โ12-23-2024 12:32 AM
Hi guys, news?
โ12-23-2024 11:43 AM
I got some information from my internal team:
The main thing to help here is deploying as a service principal and setting mode: production
on the target. This is best done by setting up automation, such as Github Actions or Azure DevOps pipeline. You may choose a different service principal as the run as user but would need to set permissions for whoever will run.You can set permissions at a few levels in DABs, so if you decide a service principal will be the owner every time you deploy then you just set appropriate run permissions for the various groups or SPs that need access.
4 weeks ago
I would like to revisit this topic as I now have a clearer understanding of my needs and the issue at hand.
Letโs assume we have only one active Databricks workspace, which currently serves as both development and production. This dual role adds complexity to managing deployments effectively.
My question is:
Since we are working with a single workspace (i.e., one target acting as both dev and prod), itโs crucial to prevent redundancy and maintain consistency.
Thank you for your assistance!
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group