cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

In what circumstances are both UAT/DEV and PROD environments actually necessary?

Oliver_Angelil
Valued Contributor II

I wanted to ask this Q yesterday in the Q&A session with Mohan Mathews, but didn't get around to it (@Kaniz Fatma​ do you know his handle here so I can tag him?)

We (and most development teams) have two environments: UAT/DEV and PROD. For those that don't know: UAT/DEV is for developing and testing software, and PROD is where code would be deployed and used by the customers/business.

I have the impression that this setup is considered industry standard, which most teams will accept without properly thinking "why". Is the added overhead of configuring a whole additional workspace and dealing with fiddly deployments worth it?

I once worked at a company that had very sensitive data, and this setup then made sense because the data on UAT would be mirrored from PROD but obfuscated, enabling more external contractors to be involved with the development of the application, without the added risk of data leaks.

But what if the data are not sensitive? Why not simply have a git branch called "prod", and "deploying" to that branch would just be a matter of merging "dev" branch into "prod"? I get the "what if the stack fails and business critical applications drop" argument... but:

1) how often will a Databricks workspace fail?

2) unless the applications being built and delivered to customers are truly critical to the business (and even a few minutes of down time would be very costly) - but I assume that for most companies that use Databricks this would not be the case.

3) does the idea of a workspace failing even make sense? In Databricks we have multiple compute clusters that can be defined. One could create a compute cluster called "PROD" only to be used for production applications...

Thank you very much in advance,

Oliver

6 REPLIES 6

@Retired_mod could you please reach out to Mohan Mathews. Thank you

-werners-
Esteemed Contributor III

I am in that particular situation, only one workspace where we have a prod and dev branch (and feature branches of course).

And we are in the process of setting up a development environment.

Why? Because it takes too much time and effort to make sure you do not break anything in prod.

Also all of our source systems have DEV/PROD, and sometimes we need to extract from dev systems, into our production data lake... You see where it gets nasty?

If it were purely the notebook code, a separate environment is not necessary, but there is more to it than that.

My ideal situation however would be to have split environments for code but not for the data itself.

LakeFS f.e. would be a nice to have (git for data!).

Thanks @Werner Stinckens​ .

"Why? Because it takes too much time and effort to make sure you do not break anything in prod."

That's the bit I'm not getting... if you have a standalone compute cluster dedicated to PROD (and only PROD), and you don't touch the code in the PROD branch outside of releases, then how would the chance of breaking something be higher than in a completely separate PROD workspace?

-werners-
Esteemed Contributor III

That's what I mentioned. If you look solely at the notebook code, a separate env is not possible.

We are still running like that.

The reason to split environments is if you need to have a link to dev systems, unit tests, config files which might be put in different places etc.

Anonymous
Not applicable

Hi @Oliver Angelil​ 

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

@Vidula Khanna​ there is no issue to resolve. See my original post. Still hoping for a response from Mohan Mathews. @Kaniz Fatma​ wondering if he got back to you?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group