<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: referencing external locations in python notebooks in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/referencing-external-locations-in-python-notebooks/m-p/104663#M41835</link>
    <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/104610"&gt;@ashraf1395&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;Referencing external locations in a Databricks Python notebook, particularly for environments like Azure DevOps with different paths for development (dev) and production (prod), can be effectively managed using parameterized variables. Here’s a detailed explanation and recommended approach:&lt;/P&gt;&lt;H3&gt;Referencing External Locations in a Python Notebook&lt;/H3&gt;&lt;P&gt;In Databricks Python notebooks, you can reference external locations (such as Azure Data Lake Storage or other cloud storage) by passing the storage path directly or using environment-specific parameters. Below is a step-by-step explanation:&lt;/P&gt;&lt;H4&gt;1. &lt;STRONG&gt;Direct Reference with Path&lt;/STRONG&gt;&lt;/H4&gt;&lt;P&gt;If you want to directly reference an ADLS path, you can use it as a string in the Python notebook:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;path = "abfss://container@storageaccount.dfs.core.windows.net/folder"
df = spark.read.format("parquet").load(path)
df.show()&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;H4&gt;2. &lt;STRONG&gt;Using Parameters for Environment Handling&lt;/STRONG&gt;&lt;/H4&gt;&lt;P&gt;For managing different environments (e.g., dev, prod), using parameterized variables is the best practice. This ensures flexibility and maintainability. You can set these parameters dynamically based on the environment being executed.&lt;/P&gt;&lt;P&gt;Example:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Define the environment (e.g., dev or prod) in Azure DevOps pipeline parameters or notebook widgets.&lt;/LI&gt;&lt;LI&gt;Use the environment variable to construct the storage path.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;# Define environment-specific parameters
env = dbutils.widgets.get("env")  # Set this widget value via Azure DevOps or manually
storage_account = "devstorage" if env == "dev" else "prodstorage"
container = "mycontainer"

# Construct the path dynamically
path = f"abfss://{container}@{storage_account}.dfs.core.windows.net/folder"

# Use the path
df = spark.read.format("parquet").load(path)
df.show()&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;H3&gt;Steps to Handle Environment-Specific Paths with Azure DevOps&lt;/H3&gt;&lt;P&gt;To handle dev and prod storage paths dynamically in Azure DevOps:&lt;/P&gt;&lt;H4&gt;1. &lt;STRONG&gt;Pass Environment as a Parameter&lt;/STRONG&gt;&lt;/H4&gt;&lt;UL&gt;&lt;LI&gt;In your Azure DevOps pipeline, pass the environment as a parameter (env: dev or env: prod).&lt;/LI&gt;&lt;LI&gt;Inject the parameter into your notebook using Databricks CLI or API when running the notebook.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;parameters:
  - name: env
    type: string
    default: dev

steps:
  - task: DatabricksRunNotebook@2
    inputs:
      notebookPath: /path/to/notebook
      parameters: '{"env": "$(env)"}'&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;H4&gt;2. &lt;STRONG&gt;Use Environment Variables&lt;/STRONG&gt;&lt;/H4&gt;&lt;P&gt;In your Python notebook, use the passed env parameter to decide the storage account dynamically, as shown in the Python example above.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;H3&gt;Using a Single Variable for Storage Accounts&lt;/H3&gt;&lt;P&gt;You can use a structured approach where the storage account name is a function of the environment.&lt;/P&gt;&lt;P&gt;For example:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;# Define environment and construct the path
env = dbutils.widgets.get("env")  # 'dev' or 'prod'
storage_accounts = {
    "dev": "devstorageaccount",
    "prod": "prodstorageaccount"
}
container = "mycontainer"

# Get storage account based on the environment
storage_account = storage_accounts.get(env, "defaultstorageaccount")
path = f"abfss://{container}@{storage_account}.dfs.core.windows.net/folder"

# Load data
df = spark.read.format("parquet").load(path)
df.show()&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;H3&gt;Best Practices for Managing External Location References&lt;/H3&gt;&lt;OL&gt;&lt;LI&gt;&lt;STRONG&gt;Parameterize the Environment&lt;/STRONG&gt;: Always use parameters to pass environment-specific values.&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Environment Mapping&lt;/STRONG&gt;: Maintain a mapping of environments to storage accounts and paths in a configuration file or dictionary in the notebook.&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Secure Configuration&lt;/STRONG&gt;: Use Azure Key Vault for storing sensitive information like storage account keys or connection strings.&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Test Across Environments&lt;/STRONG&gt;: Validate that both dev and prod configurations work seamlessly in the pipeline.&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;For more detailed information, refer to the official &lt;A href="https://docs.databricks.com/en/sql/language-manual/sql-ref-external-locations.html" target="_new" rel="noopener"&gt;&lt;SPAN&gt;Databricks&lt;/SPAN&gt;&lt;SPAN&gt; External&lt;/SPAN&gt;&lt;SPAN&gt; Locations&lt;/SPAN&gt;&lt;SPAN&gt; Documentation&lt;/SPAN&gt;&lt;/A&gt;.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 08 Jan 2025 09:21:02 GMT</pubDate>
    <dc:creator>fmadeiro</dc:creator>
    <dc:date>2025-01-08T09:21:02Z</dc:date>
    <item>
      <title>referencing external locations in python notebooks</title>
      <link>https://community.databricks.com/t5/data-engineering/referencing-external-locations-in-python-notebooks/m-p/104652#M41830</link>
      <description>&lt;P&gt;How can I refrence external lcoations in python notebook.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;I got the docs for referencing it in python :&amp;nbsp;&lt;A href="https://docs.databricks.com/en/sql/language-manual/sql-ref-external-locations.html" target="_blank"&gt;https://docs.databricks.com/en/sql/language-manual/sql-ref-external-locations.html&lt;/A&gt;.&lt;BR /&gt;&lt;BR /&gt;But how to do it in python. I am not able to understand. Do we have to pass the adls:// path directly in the python notebook or is there any other way.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;One more question I..e - all our python notebooks are being handled by azure devops for multiple dev and prod envs.&lt;BR /&gt;So the storage container paths are different for dev and prod.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;Let's say if I have to pass the adls:// paths to reference the external locations then for referencing it in dev - I have to pass dev storage account and for prod I have to pass prod storage account. And we are using single azure devops with multiple parameters. So will best method be to create a var which handles storage account as per the envs.&lt;BR /&gt;Like if prod then -the paratmer has prod storage account path and if dev the dev staoge account path.&lt;BR /&gt;So I will be referencing it like this : adls://path/{stroage_container} ?&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 08 Jan 2025 07:45:54 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/referencing-external-locations-in-python-notebooks/m-p/104652#M41830</guid>
      <dc:creator>ashraf1395</dc:creator>
      <dc:date>2025-01-08T07:45:54Z</dc:date>
    </item>
    <item>
      <title>Re: referencing external locations in python notebooks</title>
      <link>https://community.databricks.com/t5/data-engineering/referencing-external-locations-in-python-notebooks/m-p/104663#M41835</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/104610"&gt;@ashraf1395&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;Referencing external locations in a Databricks Python notebook, particularly for environments like Azure DevOps with different paths for development (dev) and production (prod), can be effectively managed using parameterized variables. Here’s a detailed explanation and recommended approach:&lt;/P&gt;&lt;H3&gt;Referencing External Locations in a Python Notebook&lt;/H3&gt;&lt;P&gt;In Databricks Python notebooks, you can reference external locations (such as Azure Data Lake Storage or other cloud storage) by passing the storage path directly or using environment-specific parameters. Below is a step-by-step explanation:&lt;/P&gt;&lt;H4&gt;1. &lt;STRONG&gt;Direct Reference with Path&lt;/STRONG&gt;&lt;/H4&gt;&lt;P&gt;If you want to directly reference an ADLS path, you can use it as a string in the Python notebook:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;path = "abfss://container@storageaccount.dfs.core.windows.net/folder"
df = spark.read.format("parquet").load(path)
df.show()&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;H4&gt;2. &lt;STRONG&gt;Using Parameters for Environment Handling&lt;/STRONG&gt;&lt;/H4&gt;&lt;P&gt;For managing different environments (e.g., dev, prod), using parameterized variables is the best practice. This ensures flexibility and maintainability. You can set these parameters dynamically based on the environment being executed.&lt;/P&gt;&lt;P&gt;Example:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Define the environment (e.g., dev or prod) in Azure DevOps pipeline parameters or notebook widgets.&lt;/LI&gt;&lt;LI&gt;Use the environment variable to construct the storage path.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;# Define environment-specific parameters
env = dbutils.widgets.get("env")  # Set this widget value via Azure DevOps or manually
storage_account = "devstorage" if env == "dev" else "prodstorage"
container = "mycontainer"

# Construct the path dynamically
path = f"abfss://{container}@{storage_account}.dfs.core.windows.net/folder"

# Use the path
df = spark.read.format("parquet").load(path)
df.show()&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;H3&gt;Steps to Handle Environment-Specific Paths with Azure DevOps&lt;/H3&gt;&lt;P&gt;To handle dev and prod storage paths dynamically in Azure DevOps:&lt;/P&gt;&lt;H4&gt;1. &lt;STRONG&gt;Pass Environment as a Parameter&lt;/STRONG&gt;&lt;/H4&gt;&lt;UL&gt;&lt;LI&gt;In your Azure DevOps pipeline, pass the environment as a parameter (env: dev or env: prod).&lt;/LI&gt;&lt;LI&gt;Inject the parameter into your notebook using Databricks CLI or API when running the notebook.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;parameters:
  - name: env
    type: string
    default: dev

steps:
  - task: DatabricksRunNotebook@2
    inputs:
      notebookPath: /path/to/notebook
      parameters: '{"env": "$(env)"}'&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;H4&gt;2. &lt;STRONG&gt;Use Environment Variables&lt;/STRONG&gt;&lt;/H4&gt;&lt;P&gt;In your Python notebook, use the passed env parameter to decide the storage account dynamically, as shown in the Python example above.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;H3&gt;Using a Single Variable for Storage Accounts&lt;/H3&gt;&lt;P&gt;You can use a structured approach where the storage account name is a function of the environment.&lt;/P&gt;&lt;P&gt;For example:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;# Define environment and construct the path
env = dbutils.widgets.get("env")  # 'dev' or 'prod'
storage_accounts = {
    "dev": "devstorageaccount",
    "prod": "prodstorageaccount"
}
container = "mycontainer"

# Get storage account based on the environment
storage_account = storage_accounts.get(env, "defaultstorageaccount")
path = f"abfss://{container}@{storage_account}.dfs.core.windows.net/folder"

# Load data
df = spark.read.format("parquet").load(path)
df.show()&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;H3&gt;Best Practices for Managing External Location References&lt;/H3&gt;&lt;OL&gt;&lt;LI&gt;&lt;STRONG&gt;Parameterize the Environment&lt;/STRONG&gt;: Always use parameters to pass environment-specific values.&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Environment Mapping&lt;/STRONG&gt;: Maintain a mapping of environments to storage accounts and paths in a configuration file or dictionary in the notebook.&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Secure Configuration&lt;/STRONG&gt;: Use Azure Key Vault for storing sensitive information like storage account keys or connection strings.&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Test Across Environments&lt;/STRONG&gt;: Validate that both dev and prod configurations work seamlessly in the pipeline.&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;For more detailed information, refer to the official &lt;A href="https://docs.databricks.com/en/sql/language-manual/sql-ref-external-locations.html" target="_new" rel="noopener"&gt;&lt;SPAN&gt;Databricks&lt;/SPAN&gt;&lt;SPAN&gt; External&lt;/SPAN&gt;&lt;SPAN&gt; Locations&lt;/SPAN&gt;&lt;SPAN&gt; Documentation&lt;/SPAN&gt;&lt;/A&gt;.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 08 Jan 2025 09:21:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/referencing-external-locations-in-python-notebooks/m-p/104663#M41835</guid>
      <dc:creator>fmadeiro</dc:creator>
      <dc:date>2025-01-08T09:21:02Z</dc:date>
    </item>
  </channel>
</rss>

