cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Managing values that change between development and production

turagittech
Contributor

Hi all, when moving from development to testing a production one often needs to handle change values like the blob store or database server being different

I have seen that using widgets can be a useful way to have updateable values for Notebooks and there would be a variety of ways of handling it for python code. What I want to know is what people are using to store said values. Do you use Databricks tables. Obviously secrets go to something like key vault.  I also need to track load values as azure blob storage without HNS is a pain to get data from and I need to track last load timestamps

Happy to hear what other people have found is working. Tables seem obvious, did I miss something?

1 ACCEPTED SOLUTION

Accepted Solutions

Paul_Mc
New Contributor III

We have handled this in a variety of ways and each has pros and cons:
1. Setup the values in the key vault / secret scopes. As we have an AKV aligned to each environment workspace it also allows us to pass in those environment variables. Downside is that they are all redacted so can be a pain during debugging sometimes.

2. Store them in a table - as we have a default catalog for each workspace we can vary the values in the table. Although as they are stored as Delta tables they are not necessarily the quickest.

3. Store them in a json file and read that in to python. That can be stored in a volume or path on the data lake etc.

4. When using ADF - pass in the values through widgest and make use of the ADF environment varibales function.

Hope that helps

View solution in original post

2 REPLIES 2

Paul_Mc
New Contributor III

We have handled this in a variety of ways and each has pros and cons:
1. Setup the values in the key vault / secret scopes. As we have an AKV aligned to each environment workspace it also allows us to pass in those environment variables. Downside is that they are all redacted so can be a pain during debugging sometimes.

2. Store them in a table - as we have a default catalog for each workspace we can vary the values in the table. Although as they are stored as Delta tables they are not necessarily the quickest.

3. Store them in a json file and read that in to python. That can be stored in a volume or path on the data lake etc.

4. When using ADF - pass in the values through widgest and make use of the ADF environment varibales function.

Hope that helps

turagittech
Contributor

Great, thanks. Speed in this case isn't critical as it's not processing massive amounts of data, well I hope not massive amounts at this time. It'll be some batch processes that can't use dlt.