Databricks

EricOX · ‎09-28-2021

May I know any suggested way to handle different environment variables for the same code base? For example, the mount point of Data Lake for DEV, UAT, and PROD. Any recommendations or best practices? Moreover, how to handle Azure DevOps?

-werners- · ‎09-28-2021

@Eric Yeung , you can put all your configuration parameters in a file (JSON, CONF, YAML whatever you like) and read that file at the beginning of each program.

I like to use the ConfigFactory in Scala for example.

You only have to make sure the file can be read (f.e. if you put in on your data lake, but the file contains the path to the data lake, you are in trouble).

How to handle devops? That is not an easy one. One can go from as simple as using databricks repos to a fully automated deployment pipeline with automated tests etc.

Your question is perhaps a tad too general to answer.

The databricks docs have some information on CI/CD (if that is what you mean by Azure Devops).

Besides all that: if you use notebooks, use the Repos functionality in databricks.

View solution in original post

Kaniz · ‎09-28-2021

Hi @ EricOX! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the community have an answer to your question first. Or else I will follow up with my team and get back to you soon.Thanks.

-werners- · ‎09-28-2021

@Eric Yeung , you can put all your configuration parameters in a file (JSON, CONF, YAML whatever you like) and read that file at the beginning of each program.

I like to use the ConfigFactory in Scala for example.

You only have to make sure the file can be read (f.e. if you put in on your data lake, but the file contains the path to the data lake, you are in trouble).

How to handle devops? That is not an easy one. One can go from as simple as using databricks repos to a fully automated deployment pipeline with automated tests etc.

Your question is perhaps a tad too general to answer.

The databricks docs have some information on CI/CD (if that is what you mean by Azure Devops).

Besides all that: if you use notebooks, use the Repos functionality in databricks.

Kaniz · ‎05-18-2022

Hi @Eric Yeung , Just a friendly follow-up. Do you still need help or the above responses help you to find the solution? Please let us know.

Databricks

How to handle configuration for different environment (e.g. DEV, PROD)?

Unity Catalog Lakeguard: Industry-first and only data governance for multi-user Apache™ Spark cluste

Announcing the General Availability of Databricks Asset Bundles

Register now and save 50% on training at Data + AI Summit!

How to successfully build GenAI applications

Meet DBRX, the New Standard for High-Quality LLMs