cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Read files from adls in databricks

SuMiT1
New Contributor II

I have unity catalogue access connector but its not enabled as i have only admin access so i dont have access to the admin portal to enable this as its need global admin permissions.

I am trying to read adls json data in databricks by using service principle but i unable to read its throwing below error 

Status code: -1 error code: null error message: Cannot resolve hostname: stas.dfs.core.windows.netjava.net.UnknownHostException: stas.dfs.core.windows.net

5 REPLIES 5

Vasireddy
Contributor II

Hi @SuMiT1 ,

The error you are seeing:
Cannot resolve hostname: stas.dfs.core.windows.net
java.net.UnknownHostException

indicates that your Databricks cluster is unable to resolve or reach the ADLS Gen2 endpoint.
This is a network/DNS connectivity issue, not related to Unity Catalog or service principal permissions.

Root Cause:
This typically happens due to one of the following reasons:

1. Incorrect storage account name
--> Make sure your endpoint is correct.
The format should be:
storage-account-name.dfs.core.windows.net
--> A small typo can trigger this error.

2. Private Endpoint or Firewall restriction
--> If your ADLS Gen2 account is behind a private endpoint, your Databricks workspace must be deployed in the same VNet or have Private DNS Zone mapping for `dfs.core.windows.net`.
Also, verify that your storage firewall allows connections from your Databricks-managed subnets.

3. DNS resolution failure from Databricks
--> If the cluster cannot resolve external DNS, it wonโ€™t be able to connect to the ADLS endpoint.


How to Validate

--> Try running the following in a Databricks notebook:
import socket
socket.gethostbyname("<storage-account-name>.dfs.core.windows.net")
--> If it fails, it confirms a DNS or networking issue.

Also, go to the Azure Portal โ†’ Storage Account โ†’ Networking, and for testing, temporarily set the network access to โ€œAllow access from all networksโ€.
If it works afterward, youโ€™ll need to properly whitelist your Databricks subnets or private endpoints.

Once Connectivity Works

Then configure your service principal authentication as shown below:

--> spark.conf.set("fs.azure.account.auth.type.<storage>.dfs.core.windows.net", "OAuth")
--> spark.conf.set("fs.azure.account.oauth.provider.type.<storage>.dfs.core.windows.net",
"org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
--> spark.conf.set("fs.azure.account.oauth2.client.id.<storage>.dfs.core.windows.net", "<client_id>")
--> spark.conf.set("fs.azure.account.oauth2.client.secret.<storage>.dfs.core.windows.net", "<client_secret>")
--> spark.conf.set("fs.azure.account.oauth2.client.endpoint.<storage>.dfs.core.windows.net",
"https://login.microsoftonline.com/<tenant_id>/oauth2/token")


This issue is not related to Unity Catalog or permissions, but to network connectivity / DNS resolution between Databricks and your ADLS Gen2 account.
Once your cluster can resolve the endpoint, your read operation should work fine.

harisankar

szymon_dybczak
Esteemed Contributor III

Hi @Vasireddy ,

Could you describe your environment? Do you have VNet injected workspace? Also, you don't have to use Unity Catalog to connect to storage account. You can configure connection in the old way:

configs = {"fs.azure.account.auth.type": "OAuth",
          "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
          "fs.azure.account.oauth2.client.id": "<application-id>",
          "fs.azure.account.oauth2.client.secret": "<your-secret>",
          "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<directory-id>/oauth2/token"}

# Optionally, you can add <directory-name> to the source URI of your mount point.
dbutils.fs.mount(
  source = "abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/",
  mount_point = "/mnt/<mount-name>",
  extra_configs = configs)

Vasireddy
Contributor II

Hi @szymon_dybczak ,

thanks for sharing the OAuth mount example,that is exactly the right config

From the error message (`UnknownHostException: <account>.dfs.core.windows.net`), it looks like the cluster canโ€™t even find the storage accountโ€™s address
That often happens in VNet-injected workspaces if DNS isnโ€™t set up for the storage endpoint.

So my suggestion was simply to fix the network/DNS part first. Once the hostname resolves, your OAuth mount code will work as-is.

--> first make sure the cluster can resolve and reach the storage URL, then use your OAuth config to mount. Both pieces together solve the issue.

harisankar

szymon_dybczak
Esteemed Contributor III

Yep, I also supect networking issue. That's why I asked @SuMiT1  if he has VNet injected workspace ๐Ÿ™‚

saurabh18cs
Honored Contributor II

Hi @SuMiT1 once networking issue is resolved , also  make sure your service principal has at least Storage Blob Data Reader on the storage account/container.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now