cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Azure Data Lake Config Issue: No value for dfs.adls.oauth2.access.token.provider found in conf file.

microamp
New Contributor II

Hi,

I have files hosted on an Azure Data Lake Store which I can connect from Azure Databricks configured as per instructions here.

I can read JSON files fine, however, I'm getting the following error when I try to read an Avro file.

spark.read.format("com.databricks.spark.avro").load("adl://blah.azuredatalakestore.net/blah/blah.avro")
IllegalArgumentException: u'No value for dfs.adls.oauth2.access.token.provider found in conf file.'

I made sure that the file existed by running

dbutils.fs.ls("adl://blah.azuredatalakestore.net/blah/blah.avro")

Please note that the error refers to

dfs.adls.oauth2.access.token.provider

not

dfs.adls.oauth2.access.token.provider.type

mentioned in the documentation above. Even after I set it to something, it would still throw the same error.

Has anyone experienced this issue before? Please let me know what else I should try to further troubleshoot. Thanks.

12 REPLIES 12

AshitabhKumar
New Contributor II

Just found a workaround for the issue with avro file read operation as it seems proper configuration for dfs.adls.oauth2.access.token.provider is not setup inside. If the ADL folder is mounted on databrick notebook , then it is working . Please try following steps

1. Mount adl folder

val configs = Map(
  "dfs.adls.oauth2.access.token.provider.type" -> "ClientCredential",
  "dfs.adls.oauth2.client.id" -> "XXX",
  "dfs.adls.oauth2.credential" -> "YYY",
  "dfs.adls.oauth2.refresh.url" -> "https://login.microsoftonline.com/ZZZ/oauth2/token",
  "dfs.adls.oauth2.access.token.provider"->"org.apache.hadoop.fs.adls.oauth2.ConfCredentialBasedAccessTokenProvider") 
dbutils.fs.mount(
  source = "adl://XYZ.azuredatalakestore.net/myfolder/demo/",
  mountPoint = "/mnt/mymount",
  extraConfigs = configs)

2.Verify your file is visible on mount

dbutils.fs.ls("dbfs:/mnt/ashitabh3")
import com.databricks.spark.avro._

spark.read.avro("dbfs:/mnt/mymount/mydata.avro").show

I can see the records now

Thanks for the workaround.

I had a similar issue unrelated to Avro, but in saving a Spark ML model to ADLS. Even setting the property manually:

dfs.adls.oauth2.access.token.provider org.apache.hadoop.fs.adls.oauth2.ConfCredentialBasedAccessTokenProvider

when setting up the spark cluster would result in error message when trying to save to adl directly:

IllegalArgumentException: u'No value for dfs.adls.oauth2.access.token.provider found in conf file.'

After mounting adl folder, saving works properly.

Hi Michael,

Did you find any other way? I am trying to write TF Records into ADLS and getting the same error even after setting this config.

traindf.repartition(32).write.format('tfrecords').mode('overwrite').option('recordType', 'Example').save("ADLS_URL/my/path")

I can also confirm your workaround is working. But, it takes a long time to mount it. The main question is why this workaround is needed in the first place. Hopefully some official response from Databricks will be provided.

adina
New Contributor II

Any chance you found a solution for this by now?

microamp
New Contributor II

I have not unfortunately. I can load the Avro file as JSON although I would get corrupted data as expected, but at least that proves that the file is accessible. I don't know what's causing the above error.

microamp
New Contributor II

You may want to try mounting your Data Lake Store to DBFS and access your files through the mounted path.

https://docs.databricks.com/spark/latest/data-sources/azure/azure-datalake.html#mounting-azure-data-...

I have not tried it yet. You might find the following thread helpful however.

https://forums.databricks.com/questions/13266/azure-db-mount-on-python-unexpected-keyword-argume.htm...

PirrALuis_Simoe
New Contributor II

Any solutions for this? I can read CSV files but not geojson files because I am getting this exception.

I am getting same error for csv did you solved ??

TarasChaikovsky
New Contributor II

I had the same issue with using dynamic partitioning in ADLS using Databricks Spark Sql.

You need to pass ADLS configs as Spark configs during cluster creation:

dfs.adls.oauth2.client.id ***

dfs.adls.oauth2.refresh.url https://login.microsoftonline.com/**/oauth2/token

dfs.adls.oauth2.credential **

dfs.adls.oauth2.access.token.provider.type ClientCredential

dfs.adls.oauth2.access.token.provider org.apache.hadoop.fs.adls.oauth2.ConfCredentialBasedAccessTokenProvider

You also need to set hadoopConfiguration for RDD related functionality:

spark.sparkContext.hadoopConfiguration.set("dfs.adls.oauth2.access.token.provider.type", spark.conf.get("dfs.adls.oauth2.access.token.provider.type"))

spark.sparkContext.hadoopConfiguration.set("dfs.adls.oauth2.client.id", spark.conf.get("dfs.adls.oauth2.client.id"))

spark.sparkContext.hadoopConfiguration.set("dfs.adls.oauth2.credential", spark.conf.get("dfs.adls.oauth2.credential"))

spark.sparkContext.hadoopConfiguration.set("dfs.adls.oauth2.refresh.url", spark.conf.get("dfs.adls.oauth2.refresh.url"))

Those two measures fixed the issue for me.

/Taras

Like Taras said, after adding spark.sparkContext.hadoopConfiguration.set no need to mount adl folder

User16301467523
New Contributor II

Taras's answer is correct. Because spark-avro is based on the RDD APIs, the properties must be set in the hadoopConfiguration options.

Please note these docs for configuration using the RDD API: https://docs.azuredatabricks.net/spark/latest/data-sources/azure/azure-datalake.html#access-azure-da...

In Python, you can use

sc._jsc.hadoopConfiguration().set()

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group