<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Establishing a Connection between ADLS Gen2, Databricks and ADF In Microsoft Azure in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/establishing-a-connection-between-adls-gen2-databricks-and-adf/m-p/141176#M51649</link>
    <description>&lt;P&gt;For a production environment (ADF as orchestrator, ADLS Gen2 as storage, Databricks for PySpark transformations), follow Microsoft-recommended best practices:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;Databricks → ADLS Gen2&lt;/STRONG&gt;: Use Unity Catalog with Azure Managed Identity (via Access Connector) for direct, secure access without secrets or mounts. Avoid mounting in production (it's legacy and less secure/governable). If not using Unity Catalog yet, fall back to Service Principal + OAuth with secrets from Azure Key Vault.&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;ADF → Databricks&lt;/STRONG&gt;: Create a Databricks linked service using a Personal Access Token (PAT) stored in Azure Key Vault.&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;ADF → ADLS Gen2&lt;/STRONG&gt;: Use System-assigned Managed Identity or Service Principal (secrets in Key Vault).&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Key Vault&lt;/STRONG&gt;: Yes, route secrets through Key Vault – it's essential for production security (never hardcode credentials).&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Below are detailed, step-by-step instructions for a fully secure setup.&lt;/P&gt;&lt;H4&gt;1. Prerequisites&lt;/H4&gt;&lt;UL&gt;&lt;LI&gt;Azure subscription with Contributor/Owner access.&lt;/LI&gt;&lt;LI&gt;Create an Azure Key Vault.&lt;/LI&gt;&lt;LI&gt;Enable Unity Catalog on your Databricks workspace (recommended for production governance). If not possible yet, see the Service Principal fallback in section 2.&lt;/LI&gt;&lt;/UL&gt;&lt;H4&gt;2. Databricks to ADLS Gen2 Access (Recommended: Unity Catalog + Managed Identity)&lt;/H4&gt;&lt;P&gt;This is the modern, secretless approach (no Key Vault needed for storage access).&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Step 1: Create an Azure Databricks Access Connector&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;In Azure Portal → Search for "Databricks Access Connector" → Create.&lt;/LI&gt;&lt;LI&gt;Note the &lt;STRONG&gt;Resource ID&lt;/STRONG&gt; (e.g.,&lt;/LI&gt;&lt;/UL&gt;&lt;LI-CODE lang="python"&gt;/subscriptions/.../resourceGroups/.../providers/Microsoft.Databricks/accessConnectors/my-connector​&lt;/LI-CODE&gt;&lt;P&gt;&lt;STRONG&gt;Step 2: Grant the Access Connector permission on ADLS Gen2&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Go to your ADLS Gen2 storage account → Access Control (IAM) → Add role assignment.&lt;/LI&gt;&lt;LI&gt;Role: &lt;STRONG&gt;Storage Blob Data Contributor&lt;/STRONG&gt; (or finer-grained if needed).&lt;/LI&gt;&lt;LI&gt;Assign to: The Access Connector (search by name or use its Managed Identity Application ID).&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;STRONG&gt;Step 3: In Databricks, create a Storage Credential (Unity Catalog)&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;In Databricks workspace → Catalog → Add → Storage credential.&lt;/LI&gt;&lt;LI&gt;Type: Managed identity.&lt;/LI&gt;&lt;LI&gt;Paste the Access Connector's Resource ID.&lt;/LI&gt;&lt;LI&gt;Test the connection.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;STRONG&gt;Step 4: Create an External Location (points to ADLS containers)&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Catalog → Add → External location.&lt;/LI&gt;&lt;LI&gt;Select the Storage Credential above.&lt;/LI&gt;&lt;LI&gt;Path&amp;nbsp;&lt;/LI&gt;&lt;/UL&gt;&lt;LI-CODE lang="python"&gt;abfss://&amp;lt;container&amp;gt;@&amp;lt;storage-account&amp;gt;.dfs.core.windows.net/&amp;lt;optional-folder&amp;gt;​&lt;/LI-CODE&gt;&lt;UL&gt;&lt;LI&gt;Grant READ/WRITE permissions to users/groups as needed.&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Step 5: In PySpark notebooks&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;No mounts or configs needed.&lt;/LI&gt;&lt;LI&gt;Read/write directly:&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;LI-CODE lang="python"&gt;df = spark.read.parquet("abfss://&amp;lt;container&amp;gt;@&amp;lt;storage-account&amp;gt;.dfs.core.windows.net/path/to/data")
df.write.format("delta").save("abfss://&amp;lt;container&amp;gt;@&amp;lt;storage-account&amp;gt;.dfs.core.windows.net/output")​&lt;/LI-CODE&gt;&lt;UL&gt;&lt;LI&gt;Unity Catalog enforces governance (auditing, access controls).&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;STRONG&gt;Fallback if no Unity Catalog: Service Principal + Key Vault&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Register an App in Microsoft Entra ID → Note Client ID, Tenant ID, generate Client Secret.&lt;/LI&gt;&lt;LI&gt;Grant the Service Principal &lt;STRONG&gt;Storage Blob Data Contributor&lt;/STRONG&gt; on ADLS Gen2.&lt;/LI&gt;&lt;LI&gt;Store Client ID, Secret, Tenant ID as secrets in Key Vault.&lt;/LI&gt;&lt;LI&gt;In Databricks: Create Key Vault-backed secret scope (URL: &lt;SPAN&gt;https://#secrets/createScope&lt;/SPAN&gt; in workspace).&lt;/LI&gt;&lt;LI&gt;In notebooks, set Spark configs (no mount needed for production):&lt;/LI&gt;&lt;/UL&gt;&lt;LI-CODE lang="python"&gt;spark.conf.set("fs.azure.account.auth.type.&amp;lt;storage-account&amp;gt;.dfs.core.windows.net", "OAuth")
spark.conf.set("fs.azure.account.oauth.provider.type.&amp;lt;storage-account&amp;gt;.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set("fs.azure.account.oauth2.client.id.&amp;lt;storage-account&amp;gt;.dfs.core.windows.net", dbutils.secrets.get(scope="&amp;lt;scope&amp;gt;", key="client-id"))
spark.conf.set("fs.azure.account.oauth2.client.secret.&amp;lt;storage-account&amp;gt;.dfs.core.windows.net", dbutils.secrets.get(scope="&amp;lt;scope&amp;gt;", key="client-secret"))
spark.conf.set("fs.azure.account.oauth2.client.endpoint.&amp;lt;storage-account&amp;gt;.dfs.core.windows.net", "https://login.microsoftonline.com/&amp;lt;tenant-id&amp;gt;/oauth2/token")​&lt;/LI-CODE&gt;&lt;H4&gt;3. ADF to Databricks Linked Service (Secure with Key Vault)&lt;/H4&gt;&lt;P&gt;&lt;STRONG&gt;Step 1: Generate Databricks Personal Access Token (PAT)&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;In Databricks → User Settings → Developer → Access tokens → Generate new token (no expiration for production).&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;STRONG&gt;Step 2: Store PAT in Key Vault&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Key Vault → Secrets → Generate/Import → Name: e.g., &lt;SPAN&gt;databricks-pat&lt;/SPAN&gt;.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;STRONG&gt;Step 3: Grant ADF access to Key Vault&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Enable System-assigned Managed Identity on your ADF (Properties tab).&lt;/LI&gt;&lt;LI&gt;Key Vault → Access policies → Add → Principal: Your ADF's Managed Identity → Permissions: Get (secrets).&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;STRONG&gt;Step 4: Create Key Vault Linked Service in ADF&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;ADF → Manage → Linked services → New → Azure Key Vault → Select your Key Vault.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;STRONG&gt;Step 5: Create Databricks Linked Service&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Linked services → New → Azure Databricks.&lt;/LI&gt;&lt;LI&gt;Workspace URL: e.g., &lt;SPAN&gt;&lt;A href="https://adb-xxx.azuredatabricks.net" target="_blank" rel="noopener"&gt;https://adb-xxx.azuredatabricks.net&lt;/A&gt;&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;Authentication: Access token.&lt;/LI&gt;&lt;LI&gt;For the token: Select "Azure Key Vault" → Choose the Key Vault linked service → Secret name: &lt;SPAN&gt;databricks-pat&lt;/SPAN&gt;.&lt;/LI&gt;&lt;LI&gt;Cluster: Use an existing Job cluster or new (for production, use Job clusters or serverless).&lt;/LI&gt;&lt;/UL&gt;&lt;H4&gt;4. ADF to ADLS Gen2 Linked Service (Secure)&lt;/H4&gt;&lt;UL&gt;&lt;LI&gt;Linked services → New → Azure Data Lake Storage Gen2.&lt;/LI&gt;&lt;LI&gt;Authentication: System-assigned Managed Identity (recommended, secretless) or Service Principal (store ID/Secret in Key Vault as above).&lt;/LI&gt;&lt;LI&gt;Test connection.&lt;/LI&gt;&lt;/UL&gt;&lt;H4&gt;5. Orchestrate with ADF Pipeline&lt;/H4&gt;&lt;UL&gt;&lt;LI&gt;Create a pipeline.&lt;/LI&gt;&lt;LI&gt;Add &lt;STRONG&gt;Databricks Notebook&lt;/STRONG&gt; activity.&lt;/LI&gt;&lt;LI&gt;Linked service: The one from step 3.&lt;/LI&gt;&lt;LI&gt;Notebook path: e.g., &lt;SPAN&gt;/Users/yourname/my-notebook&lt;/SPAN&gt;.&lt;/LI&gt;&lt;LI&gt;Pass parameters if needed (e.g., file paths in ADLS).&lt;/LI&gt;&lt;LI&gt;For input/output: Use ADLS linked datasets (abfss:// paths).&lt;/LI&gt;&lt;LI&gt;Trigger: Schedule, Tumbling window, or Event-based (on new files in ADLS).&lt;/LI&gt;&lt;/UL&gt;&lt;H4&gt;Production Tips&lt;/H4&gt;&lt;UL&gt;&lt;LI&gt;Use Job clusters (not interactive) for cost/reliability.&lt;/LI&gt;&lt;LI&gt;Enable ADF monitoring, alerts, and Git integration.&lt;/LI&gt;&lt;LI&gt;Rotate secrets/PATs regularly.&lt;/LI&gt;&lt;LI&gt;Network security: VNet-integrate Databricks and use Private Endpoints for ADLS/Key Vault if needed.&lt;/LI&gt;&lt;LI&gt;If using Delta Lake tables on ADLS, register them in Unity Catalog for governance.&lt;/LI&gt;&lt;/UL&gt;</description>
    <pubDate>Thu, 04 Dec 2025 15:51:21 GMT</pubDate>
    <dc:creator>nayan_wylde</dc:creator>
    <dc:date>2025-12-04T15:51:21Z</dc:date>
    <item>
      <title>Establishing a Connection between ADLS Gen2, Databricks and ADF In Microsoft Azure</title>
      <link>https://community.databricks.com/t5/data-engineering/establishing-a-connection-between-adls-gen2-databricks-and-adf/m-p/141166#M51641</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;May, Someone please help me with establishing connection between ADLS Gen2, Databricks and ADF, full steps if possibble. Do I need to route through key-vault, this is i am first time doing in production,.&lt;/P&gt;&lt;P&gt;May somebody please share detailed step for implementing in Production.&lt;/P&gt;&lt;P&gt;ADF - Orchastrator&lt;/P&gt;&lt;P&gt;ADLS Gen2 - Storage&lt;/P&gt;&lt;P&gt;Databricks - Processsing data, transformation using pyspark.&lt;/P&gt;&lt;P&gt;Thanks a lot&lt;/P&gt;</description>
      <pubDate>Thu, 04 Dec 2025 14:28:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/establishing-a-connection-between-adls-gen2-databricks-and-adf/m-p/141166#M51641</guid>
      <dc:creator>Pratikmsbsvm</dc:creator>
      <dc:date>2025-12-04T14:28:03Z</dc:date>
    </item>
    <item>
      <title>Re: Establishing a Connection between ADLS Gen2, Databricks and ADF In Microsoft Azure</title>
      <link>https://community.databricks.com/t5/data-engineering/establishing-a-connection-between-adls-gen2-databricks-and-adf/m-p/141173#M51646</link>
      <description>&lt;P&gt;Hi!&lt;/P&gt;&lt;P class=""&gt;I asume that ADF is just the trigger and it's from Databricks the direct access to ADLS to process the data.&lt;/P&gt;&lt;P&gt;&lt;U&gt;ADLS Access:&lt;/U&gt;&lt;/P&gt;&lt;P class=""&gt;You create an&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;External Location&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;in your Databricks workspace that acts as a bridge to ADLS. This is done through Catalog Explorer.&lt;/P&gt;&lt;P class=""&gt;To set it up:&lt;/P&gt;&lt;UL class=""&gt;&lt;LI&gt;&lt;P class=""&gt;Create a&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;Storage Credential&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;using a Managed Identity (or Service Principal) that has permissions to your ADLS&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P class=""&gt;Create an&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;External Location&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;that links this credential to your specific ADLS path&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P class=""&gt;You can assign granular permissions at the workspace or catalog level&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P class=""&gt;That's it. Now Databricks can read and write to that ADLS path directly.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="juan_maedo_1-1764861010907.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/22120iCBE715FCC1FB277D/image-size/medium?v=v2&amp;amp;px=400" role="button" title="juan_maedo_1-1764861010907.png" alt="juan_maedo_1-1764861010907.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P class=""&gt;Reference:&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A class="" href="https://learn.microsoft.com/en-us/azure/databricks/connect/unity-catalog/cloud-storage/external-locations" target="_blank" rel="nofollow noopener"&gt;&lt;SPAN class=""&gt;https://learn.microsoft.com/en-us/azure/databricks/connect/unity-catalog/cloud-storage/external-locations&lt;/SPAN&gt;&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;U&gt;Calling Databricks Job&lt;/U&gt;&lt;/P&gt;&lt;P class=""&gt;To trigger a Databricks job from ADF, you need:&lt;/P&gt;&lt;UL class=""&gt;&lt;LI&gt;&lt;P class=""&gt;&lt;STRONG&gt;Job ID&amp;nbsp;&lt;/STRONG&gt;- the ID of your Databricks job&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P class=""&gt;&lt;STRONG&gt;Linked Service&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;- an ADF connection to your Databricks workspace (using a Service Principal)&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P class=""&gt;That's the minimum. Everything else is optional, likes warehouse/cluster id if dont need serverless, parameters jobs, etc.&lt;/P&gt;&lt;P class=""&gt;Reference:&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A class="" href="https://learn.microsoft.com/en-us/azure/data-factory/transform-data-databricks-job" target="_blank" rel="nofollow noopener"&gt;&lt;SPAN class=""&gt;https://learn.microsoft.com/en-us/azure/data-factory/transform-data-databricks-job&lt;/SPAN&gt;&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 04 Dec 2025 15:10:21 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/establishing-a-connection-between-adls-gen2-databricks-and-adf/m-p/141173#M51646</guid>
      <dc:creator>juan_maedo</dc:creator>
      <dc:date>2025-12-04T15:10:21Z</dc:date>
    </item>
    <item>
      <title>Re: Establishing a Connection between ADLS Gen2, Databricks and ADF In Microsoft Azure</title>
      <link>https://community.databricks.com/t5/data-engineering/establishing-a-connection-between-adls-gen2-databricks-and-adf/m-p/141176#M51649</link>
      <description>&lt;P&gt;For a production environment (ADF as orchestrator, ADLS Gen2 as storage, Databricks for PySpark transformations), follow Microsoft-recommended best practices:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;Databricks → ADLS Gen2&lt;/STRONG&gt;: Use Unity Catalog with Azure Managed Identity (via Access Connector) for direct, secure access without secrets or mounts. Avoid mounting in production (it's legacy and less secure/governable). If not using Unity Catalog yet, fall back to Service Principal + OAuth with secrets from Azure Key Vault.&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;ADF → Databricks&lt;/STRONG&gt;: Create a Databricks linked service using a Personal Access Token (PAT) stored in Azure Key Vault.&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;ADF → ADLS Gen2&lt;/STRONG&gt;: Use System-assigned Managed Identity or Service Principal (secrets in Key Vault).&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Key Vault&lt;/STRONG&gt;: Yes, route secrets through Key Vault – it's essential for production security (never hardcode credentials).&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Below are detailed, step-by-step instructions for a fully secure setup.&lt;/P&gt;&lt;H4&gt;1. Prerequisites&lt;/H4&gt;&lt;UL&gt;&lt;LI&gt;Azure subscription with Contributor/Owner access.&lt;/LI&gt;&lt;LI&gt;Create an Azure Key Vault.&lt;/LI&gt;&lt;LI&gt;Enable Unity Catalog on your Databricks workspace (recommended for production governance). If not possible yet, see the Service Principal fallback in section 2.&lt;/LI&gt;&lt;/UL&gt;&lt;H4&gt;2. Databricks to ADLS Gen2 Access (Recommended: Unity Catalog + Managed Identity)&lt;/H4&gt;&lt;P&gt;This is the modern, secretless approach (no Key Vault needed for storage access).&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Step 1: Create an Azure Databricks Access Connector&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;In Azure Portal → Search for "Databricks Access Connector" → Create.&lt;/LI&gt;&lt;LI&gt;Note the &lt;STRONG&gt;Resource ID&lt;/STRONG&gt; (e.g.,&lt;/LI&gt;&lt;/UL&gt;&lt;LI-CODE lang="python"&gt;/subscriptions/.../resourceGroups/.../providers/Microsoft.Databricks/accessConnectors/my-connector​&lt;/LI-CODE&gt;&lt;P&gt;&lt;STRONG&gt;Step 2: Grant the Access Connector permission on ADLS Gen2&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Go to your ADLS Gen2 storage account → Access Control (IAM) → Add role assignment.&lt;/LI&gt;&lt;LI&gt;Role: &lt;STRONG&gt;Storage Blob Data Contributor&lt;/STRONG&gt; (or finer-grained if needed).&lt;/LI&gt;&lt;LI&gt;Assign to: The Access Connector (search by name or use its Managed Identity Application ID).&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;STRONG&gt;Step 3: In Databricks, create a Storage Credential (Unity Catalog)&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;In Databricks workspace → Catalog → Add → Storage credential.&lt;/LI&gt;&lt;LI&gt;Type: Managed identity.&lt;/LI&gt;&lt;LI&gt;Paste the Access Connector's Resource ID.&lt;/LI&gt;&lt;LI&gt;Test the connection.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;STRONG&gt;Step 4: Create an External Location (points to ADLS containers)&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Catalog → Add → External location.&lt;/LI&gt;&lt;LI&gt;Select the Storage Credential above.&lt;/LI&gt;&lt;LI&gt;Path&amp;nbsp;&lt;/LI&gt;&lt;/UL&gt;&lt;LI-CODE lang="python"&gt;abfss://&amp;lt;container&amp;gt;@&amp;lt;storage-account&amp;gt;.dfs.core.windows.net/&amp;lt;optional-folder&amp;gt;​&lt;/LI-CODE&gt;&lt;UL&gt;&lt;LI&gt;Grant READ/WRITE permissions to users/groups as needed.&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Step 5: In PySpark notebooks&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;No mounts or configs needed.&lt;/LI&gt;&lt;LI&gt;Read/write directly:&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;LI-CODE lang="python"&gt;df = spark.read.parquet("abfss://&amp;lt;container&amp;gt;@&amp;lt;storage-account&amp;gt;.dfs.core.windows.net/path/to/data")
df.write.format("delta").save("abfss://&amp;lt;container&amp;gt;@&amp;lt;storage-account&amp;gt;.dfs.core.windows.net/output")​&lt;/LI-CODE&gt;&lt;UL&gt;&lt;LI&gt;Unity Catalog enforces governance (auditing, access controls).&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;STRONG&gt;Fallback if no Unity Catalog: Service Principal + Key Vault&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Register an App in Microsoft Entra ID → Note Client ID, Tenant ID, generate Client Secret.&lt;/LI&gt;&lt;LI&gt;Grant the Service Principal &lt;STRONG&gt;Storage Blob Data Contributor&lt;/STRONG&gt; on ADLS Gen2.&lt;/LI&gt;&lt;LI&gt;Store Client ID, Secret, Tenant ID as secrets in Key Vault.&lt;/LI&gt;&lt;LI&gt;In Databricks: Create Key Vault-backed secret scope (URL: &lt;SPAN&gt;https://#secrets/createScope&lt;/SPAN&gt; in workspace).&lt;/LI&gt;&lt;LI&gt;In notebooks, set Spark configs (no mount needed for production):&lt;/LI&gt;&lt;/UL&gt;&lt;LI-CODE lang="python"&gt;spark.conf.set("fs.azure.account.auth.type.&amp;lt;storage-account&amp;gt;.dfs.core.windows.net", "OAuth")
spark.conf.set("fs.azure.account.oauth.provider.type.&amp;lt;storage-account&amp;gt;.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set("fs.azure.account.oauth2.client.id.&amp;lt;storage-account&amp;gt;.dfs.core.windows.net", dbutils.secrets.get(scope="&amp;lt;scope&amp;gt;", key="client-id"))
spark.conf.set("fs.azure.account.oauth2.client.secret.&amp;lt;storage-account&amp;gt;.dfs.core.windows.net", dbutils.secrets.get(scope="&amp;lt;scope&amp;gt;", key="client-secret"))
spark.conf.set("fs.azure.account.oauth2.client.endpoint.&amp;lt;storage-account&amp;gt;.dfs.core.windows.net", "https://login.microsoftonline.com/&amp;lt;tenant-id&amp;gt;/oauth2/token")​&lt;/LI-CODE&gt;&lt;H4&gt;3. ADF to Databricks Linked Service (Secure with Key Vault)&lt;/H4&gt;&lt;P&gt;&lt;STRONG&gt;Step 1: Generate Databricks Personal Access Token (PAT)&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;In Databricks → User Settings → Developer → Access tokens → Generate new token (no expiration for production).&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;STRONG&gt;Step 2: Store PAT in Key Vault&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Key Vault → Secrets → Generate/Import → Name: e.g., &lt;SPAN&gt;databricks-pat&lt;/SPAN&gt;.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;STRONG&gt;Step 3: Grant ADF access to Key Vault&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Enable System-assigned Managed Identity on your ADF (Properties tab).&lt;/LI&gt;&lt;LI&gt;Key Vault → Access policies → Add → Principal: Your ADF's Managed Identity → Permissions: Get (secrets).&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;STRONG&gt;Step 4: Create Key Vault Linked Service in ADF&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;ADF → Manage → Linked services → New → Azure Key Vault → Select your Key Vault.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;STRONG&gt;Step 5: Create Databricks Linked Service&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Linked services → New → Azure Databricks.&lt;/LI&gt;&lt;LI&gt;Workspace URL: e.g., &lt;SPAN&gt;&lt;A href="https://adb-xxx.azuredatabricks.net" target="_blank" rel="noopener"&gt;https://adb-xxx.azuredatabricks.net&lt;/A&gt;&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;Authentication: Access token.&lt;/LI&gt;&lt;LI&gt;For the token: Select "Azure Key Vault" → Choose the Key Vault linked service → Secret name: &lt;SPAN&gt;databricks-pat&lt;/SPAN&gt;.&lt;/LI&gt;&lt;LI&gt;Cluster: Use an existing Job cluster or new (for production, use Job clusters or serverless).&lt;/LI&gt;&lt;/UL&gt;&lt;H4&gt;4. ADF to ADLS Gen2 Linked Service (Secure)&lt;/H4&gt;&lt;UL&gt;&lt;LI&gt;Linked services → New → Azure Data Lake Storage Gen2.&lt;/LI&gt;&lt;LI&gt;Authentication: System-assigned Managed Identity (recommended, secretless) or Service Principal (store ID/Secret in Key Vault as above).&lt;/LI&gt;&lt;LI&gt;Test connection.&lt;/LI&gt;&lt;/UL&gt;&lt;H4&gt;5. Orchestrate with ADF Pipeline&lt;/H4&gt;&lt;UL&gt;&lt;LI&gt;Create a pipeline.&lt;/LI&gt;&lt;LI&gt;Add &lt;STRONG&gt;Databricks Notebook&lt;/STRONG&gt; activity.&lt;/LI&gt;&lt;LI&gt;Linked service: The one from step 3.&lt;/LI&gt;&lt;LI&gt;Notebook path: e.g., &lt;SPAN&gt;/Users/yourname/my-notebook&lt;/SPAN&gt;.&lt;/LI&gt;&lt;LI&gt;Pass parameters if needed (e.g., file paths in ADLS).&lt;/LI&gt;&lt;LI&gt;For input/output: Use ADLS linked datasets (abfss:// paths).&lt;/LI&gt;&lt;LI&gt;Trigger: Schedule, Tumbling window, or Event-based (on new files in ADLS).&lt;/LI&gt;&lt;/UL&gt;&lt;H4&gt;Production Tips&lt;/H4&gt;&lt;UL&gt;&lt;LI&gt;Use Job clusters (not interactive) for cost/reliability.&lt;/LI&gt;&lt;LI&gt;Enable ADF monitoring, alerts, and Git integration.&lt;/LI&gt;&lt;LI&gt;Rotate secrets/PATs regularly.&lt;/LI&gt;&lt;LI&gt;Network security: VNet-integrate Databricks and use Private Endpoints for ADLS/Key Vault if needed.&lt;/LI&gt;&lt;LI&gt;If using Delta Lake tables on ADLS, register them in Unity Catalog for governance.&lt;/LI&gt;&lt;/UL&gt;</description>
      <pubDate>Thu, 04 Dec 2025 15:51:21 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/establishing-a-connection-between-adls-gen2-databricks-and-adf/m-p/141176#M51649</guid>
      <dc:creator>nayan_wylde</dc:creator>
      <dc:date>2025-12-04T15:51:21Z</dc:date>
    </item>
  </channel>
</rss>

