cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Extract Snowflake data based on environment

ndw
New Contributor II

Hi all,

 

In the development workspace, I need to extract data from a table/view in Snowflake development environment. Example table is called as VD_DWH.SALES.SALES_DETAIL

When we deploy the code into production, it needs to extract data from a table/view in Snowflake production environment. Example table is called as VP_DWH.SALES.SALES_DETAIL

On brief, for dev database name is VD_DWH

for production database name is VP_DWH

How to implement this in Databricks?

 

4 REPLIES 4

nayan_wylde
Esteemed Contributor

@ndw Can you please share more details on the request. Are you trying to extract data from snowflake in databricks?

ndw
New Contributor II

There is a migration from the legacy data warehouse Snowflake to the lakehouse in the Databricks. Until the migration has completed, I need to transfer Snowflake data into Databricks. but dev and prod snowflake databases have different names.

In the development environment, data migration source is VD_DWH.SALES.SALES_DETAIL; in the production environment the source is VP_DWH.SALES.SALES_DETAIL

 

According to the environment I need to replace source table in the select statement 

nayan_wylde
Esteemed Contributor

Option 1 — Notebook widgets (simple & reliable)

Use a Databricks widget to pass the environment (e.g., via Jobs/Workflows parameter), then build the table name dynamically.

# Databricks notebook
dbutils.widgets.dropdown("env", "dev", ["dev", "prod"], "Environment")
env = dbutils.widgets.get("env")

# Map env → Snowflake database
db_map = {"dev": "VD_DWH", "prod": "VP_DWH"}
source_db = db_map.get(env, "VD_DWH")  # default to dev

source_table_fqn = f"{source_db}.SALES.SALES_DETAIL"

# Snowflake connection options (store secrets in Databricks Secrets)
sfOptions = {
    "sfUrl": dbutils.secrets.get("scope", "sfUrl"),
    "sfUser": dbutils.secrets.get("scope", "sfUser"),
    "sfPassword": dbutils.secrets.get("scope", "sfPassword"),
    "sfWarehouse": dbutils.secrets.get("scope", "sfWarehouse"),
    "sfRole": dbutils.secrets.get("scope", "sfRole"),
}

# Read from Snowflake table into Spark DataFrame
df = (spark.read
      .format("snowflake")
      .options(**sfOptions)
      .option("dbtable", source_table_fqn)
      .load())

# Do your transforms, then write to Delta/UC
df.write.mode("overwrite").saveAsTable("lakehouse.sales_detail")
``

 

Option 2 — Jobs/Workflows parameters (no widgets in code)

If you’re scheduling via Databricks Jobs (or your orchestrator like ADF/Airflow), pass env as a parameter and read it with dbutils.widgets.get. You can also pass the database name directly as a parameter to avoid mapping.

Option 3 — Configuration-driven (robust for many tables)

Centralize environment settings (e.g., in a UC Volume/Delta table or notebook-scoped JSON) and look them up

 

import json

config_json = """
{
  "dev": {"source_db": "VD_DWH"},
  "prod": {"source_db": "VP_DWH"}
}
"""
cfg = json.loads(config_json)
env = dbutils.widgets.get("env")  # or default 'dev'
source_db = cfg[env]["source_db"]

source_table_fqn = f"{source_db}.SALES.SALES_DETAIL"
# ... proceed to read from Snowflake

ndw
New Contributor II

Thank you

will schedule via Databricks Job. I need to pass env as dev in the development and prod in the production. How can I pass different values with the same job in different environments? Should I use DABs for passing env?

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now