Azure Data Lake Config Issue: No value for dfs.adls.oauth2.access.token.provider found in conf file.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-26-2018 02:52 AM
Hi,
I have files hosted on an Azure Data Lake Store which I can connect from Azure Databricks configured as per instructions here.
I can read JSON files fine, however, I'm getting the following error when I try to read an Avro file.
spark.read.format("com.databricks.spark.avro").load("adl://blah.azuredatalakestore.net/blah/blah.avro")
IllegalArgumentException: u'No value for dfs.adls.oauth2.access.token.provider found in conf file.'
I made sure that the file existed by running
dbutils.fs.ls("adl://blah.azuredatalakestore.net/blah/blah.avro")
Please note that the error refers to
dfs.adls.oauth2.access.token.provider
not
dfs.adls.oauth2.access.token.provider.type
mentioned in the documentation above. Even after I set it to something, it would still throw the same error.
Has anyone experienced this issue before? Please let me know what else I should try to further troubleshoot. Thanks.
- Labels:
-
Avro
-
Azure
-
Azure data lake
-
Azure databricks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-05-2018 01:22 AM
Just found a workaround for the issue with avro file read operation as it seems proper configuration for dfs.adls.oauth2.access.token.provider is not setup inside. If the ADL folder is mounted on databrick notebook , then it is working . Please try following steps
1. Mount adl folder
val configs = Map(
"dfs.adls.oauth2.access.token.provider.type" -> "ClientCredential",
"dfs.adls.oauth2.client.id" -> "XXX",
"dfs.adls.oauth2.credential" -> "YYY",
"dfs.adls.oauth2.refresh.url" -> "https://login.microsoftonline.com/ZZZ/oauth2/token",
"dfs.adls.oauth2.access.token.provider"->"org.apache.hadoop.fs.adls.oauth2.ConfCredentialBasedAccessTokenProvider")
dbutils.fs.mount(
source = "adl://XYZ.azuredatalakestore.net/myfolder/demo/",
mountPoint = "/mnt/mymount",
extraConfigs = configs)
2.Verify your file is visible on mount
dbutils.fs.ls("dbfs:/mnt/ashitabh3")
import com.databricks.spark.avro._
spark.read.avro("dbfs:/mnt/mymount/mydata.avro").show
I can see the records now
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-15-2018 06:59 PM
Thanks for the workaround.
I had a similar issue unrelated to Avro, but in saving a Spark ML model to ADLS. Even setting the property manually:
dfs.adls.oauth2.access.token.provider org.apache.hadoop.fs.adls.oauth2.ConfCredentialBasedAccessTokenProvider
when setting up the spark cluster would result in error message when trying to save to adl directly:
IllegalArgumentException: u'No value for dfs.adls.oauth2.access.token.provider found in conf file.'
After mounting adl folder, saving works properly.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-12-2019 03:28 PM
Hi Michael,
Did you find any other way? I am trying to write TF Records into ADLS and getting the same error even after setting this config.
traindf.repartition(32).write.format('tfrecords').mode('overwrite').option('recordType', 'Example').save("ADLS_URL/my/path")
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-09-2018 01:34 PM
I can also confirm your workaround is working. But, it takes a long time to mount it. The main question is why this workaround is needed in the first place. Hopefully some official response from Databricks will be provided.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-06-2018 03:10 PM
Any chance you found a solution for this by now?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-06-2018 03:34 PM
I have not unfortunately. I can load the Avro file as JSON although I would get corrupted data as expected, but at least that proves that the file is accessible. I don't know what's causing the above error.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-06-2018 04:49 PM
You may want to try mounting your Data Lake Store to DBFS and access your files through the mounted path.
I have not tried it yet. You might find the following thread helpful however.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-12-2018 03:18 PM
Any solutions for this? I can read CSV files but not geojson files because I am getting this exception.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-27-2018 11:29 PM
I am getting same error for csv did you solved ??
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-05-2018 01:29 AM
I had the same issue with using dynamic partitioning in ADLS using Databricks Spark Sql.
You need to pass ADLS configs as Spark configs during cluster creation:
dfs.adls.oauth2.client.id ***dfs.adls.oauth2.refresh.url https://login.microsoftonline.com/**/oauth2/token
dfs.adls.oauth2.credential **
dfs.adls.oauth2.access.token.provider.type ClientCredential
dfs.adls.oauth2.access.token.provider org.apache.hadoop.fs.adls.oauth2.ConfCredentialBasedAccessTokenProvider
You also need to set hadoopConfiguration for RDD related functionality:
spark.sparkContext.hadoopConfiguration.set("dfs.adls.oauth2.access.token.provider.type", spark.conf.get("dfs.adls.oauth2.access.token.provider.type"))spark.sparkContext.hadoopConfiguration.set("dfs.adls.oauth2.client.id", spark.conf.get("dfs.adls.oauth2.client.id"))
spark.sparkContext.hadoopConfiguration.set("dfs.adls.oauth2.credential", spark.conf.get("dfs.adls.oauth2.credential"))
spark.sparkContext.hadoopConfiguration.set("dfs.adls.oauth2.refresh.url", spark.conf.get("dfs.adls.oauth2.refresh.url"))
Those two measures fixed the issue for me.
/Taras
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-18-2018 06:32 AM
Like Taras said, after adding spark.sparkContext.hadoopConfiguration.set no need to mount adl folder
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-11-2018 03:46 PM
Taras's answer is correct. Because spark-avro is based on the RDD APIs, the properties must be set in the hadoopConfiguration options.
Please note these docs for configuration using the RDD API: https://docs.azuredatabricks.net/spark/latest/data-sources/azure/azure-datalake.html#access-azure-da...
In Python, you can use
sc._jsc.hadoopConfiguration().set()

