cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

[Azure Databricks] Create an External Location to Microsoft Fabric Lakehouse

stefanberreiter
New Contributor III

Hi,

I want to create an external location from Azure Databricks to a Microsoft Fabric Lakehouse, but seems I am missing something.

What did I do:

  • I created an "Access Connector for Azure Databricks" in Azure Portal
  • I created a storage credential for the access connector
  • Granted the access connector access to the Microsoft Fabric Workspace (tried out both viewer and contributor role)

Now I want to create an external location in Azure Databricks with the OneLake path, but get an error:

 

 

 

Failed to access cloud storage: [AbfsRestOperationException]

 

 

 

The pathes I tried out are of the following pattern:

 

 

 

abfss://{workspace_name}@onelake.dfs.fabric.microsoft.com/
abfss://{workspace_name}@onelake.dfs.fabric.microsoft.com/{lakehouse_name}
abfss://{workspace_name}@onelake.dfs.fabric.microsoft.com/{lakehouse_name}.Lakehouse
abfss://{workspace_name}@onelake.dfs.fabric.microsoft.com/{lakehouse_name}.Lakehouse/

 

 

 

As I struggled so far regarding documentation for this use case (connecting from Databricks to Fabric, not the other way round), I may also be on the wrong path.

Any tips what might be the issues?

Best, Stefan

PS: Fabric Lakehouse has an abfss::/ path which I already validated to read data from (within a Fabric notebook). 

 

 

 

 

import pandas as pd

pd.read_parquet(f"abfss://{workspace_name}@onelake.dfs.fabric.microsoft.com/{lakehouse_name}.Lakehouse/Tables/{table_name}")

 

 

 

Sources:
[1] Advancing Spark - External Tables with Unity Catalog - YouTube (I tried this approach to make it work with granting Fabric workspace access instead of ADLS Gen2 access in Azure Portal)

 

7 REPLIES 7

szymon_dybczak
Esteemed Contributor III

Hi @stefanberreiter ,

You need to grant access to the storage account used by your Microsoft Fabric instance, not to the fabric workspace itself.

And also you need to following role for access connector:

Storage Blob Data Contributor

So viewer and contributor are not correct.

Use Azure managed identities in Unity Catalog to access storage - Azure Databricks | Microsoft Learn

Hi @szymon_dybczak,

thanks for helping out. 
It seems (at least from this blog post) that you can do it directly from within a Fabric Workspace (grant access) once in Tenant settings you enabled for OneLake settings "Users can access data stored in OneLake with apps external to Fabric".

stefanberreiter_0-1725358819777.png

from source: "The second setting can be found a bit further down under OneLake settings. This setting allows you to use non-Fabric applications like a Python SDK, Databricks, and more to read and write to the OneLake."

Do you know what else one would need to configure (and where) to add the Managed Identity? Can you guide me a bit more with what you said in terms of "You need to grant an access for Access Connector for Azure Databricks to the storage account your Fabric instance use" as I believe it's automatically managed in OneLake ("OneLake comes automatically with every Microsoft Fabric tenant").

 

How to use service principal authentication to access Microsoft Fabric's OneLake (dataroots.io)

Hi @stefanberreiter ,

Ok, so it looks like you need to enable Azure Data Lake Storage credential passthrough to make it work. Did you do this step?

Slash_0-1725360388026.png

 

Below is step by step instuction from documenation:

Integrate OneLake with Azure Databricks - Microsoft Fabric | Microsoft Learn

And also you can take a look on below video:

Leverage OneLake with Azure Databricks (youtube.com)

Hi @szymon_dybczak ,

thanks for replying. I've researched a bit into it (thanks for the sources) - now a few more questions are popping up.

It seems like Credential Passthrough will be deprecated and it just works in conjunction with a cluster - while I am looking for a way to have an external table. So the idea would be pointing at the storage of the Lakehouse data in Fabric, not reading and then copying it to Databricks (which I believe is the use case of the video).

stefanberreiter_0-1725361593979.png

 

szymon_dybczak
Esteemed Contributor III

And if you want to use service principal authentication (assuming you have already one) then you need to add this servicep principal to fabric workspace (like in url you send :How to use service principal authentication to access Microsoft Fabric's OneLake (dataroots.io)).
Then you can use service principal authentication in following way in databricks:

storage_account = "<storage_account>"
tenant_id = "<tenant_id>"
service_principal_id = "<service_principal_id>"
service_principal_password = "<service_principal_password>"

spark.conf.set(f"fs.azure.account.auth.type.{storage_account}.dfs.core.windows.net", "OAuth")
spark.conf.set(f"fs.azure.account.oauth.provider.type.{storage_account}.dfs.core.windows.net",  "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set(f"fs.azure.account.oauth2.client.id.{storage_account}.dfs.core.windows.net", service_principal_id)
spark.conf.set(f"fs.azure.account.oauth2.client.secret.{storage_account}.dfs.core.windows.net", service_principal_password)
spark.conf.set(f"fs.azure.account.oauth2.client.endpoint.{storage_account}.dfs.core.windows.net", f"https://login.microsoftonline.com/{tenant_id}/oauth2/token")

# read with spn
df = spark.read.format("parquet").load(f"abfss://default@{storage_account}.dfs.core.windows.net/data/unmanaged/t_unmanag_parquet")
df.show(10)

# write
df.write.format("delta").mode("overwrite").save(f"abfss://default@{storage_account}.dfs.core.windows.net/data/unmanaged/fab_unmanag_delta_spn")

stefanberreiter
New Contributor III

I guess I'm looking now into Lakehouse federation for the SQL endpoint of the Fabric Lakehouse - which comes closest to the experience of the External Table I guess.
Running Federated Queries from Unity Catalog on Microsoft Fabric SQL Endpoint | by Aitor Murguzur | ...

Yeah, that seems like a good option. Thought it also uses service princpal to authenticate. I think in the future they will add ability to use databrticks access connector (MSI) as a valid authentication option to one lake.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group