Databricks

Deepak_Kandpal · ‎09-27-2022

I have setup my Databricks notebook to use Service Principal to access ADLS using below configuration.

service_credential = dbutils.secrets.get(scope="<scope>",key="<service-credential-key>")
 
spark.conf.set("fs.azure.account.auth.type.<storage-account>.dfs.core.windows.net", "OAuth")
spark.conf.set("fs.azure.account.oauth.provider.type.<storage-account>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set("fs.azure.account.oauth2.client.id.<storage-account>.dfs.core.windows.net", "<application-id>")
spark.conf.set("fs.azure.account.oauth2.client.secret.<storage-account>.dfs.core.windows.net", service_credential)
spark.conf.set("fs.azure.account.oauth2.client.endpoint.<storage-account>.dfs.core.windows.net", "https://login.microsoftonline.com/<directory-id>/oauth2/token")

I am able to read csv file from ADLS however getting Invalid configuration value detected for fs.azure.account.key with excel file. Below is the code to read excel file.

#libaray used com.crealytics:spark-excel_2.12:3.2.2_0.18.0
 
df = spark.read.format("com.crealytics.spark.excel") \
    .option("header", "true") \
    .option("dataAddress", "'Sheet1'!A1:BA100000")\
    .option("delimiter", ",") \
    .option("inferSchema", "true") \
    .option("multiline", "true") \
    .load(file_path_full)

Stack Trace

Failure to initialize configurationInvalid configuration value detected for fs.azure.account.key
    at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.SimpleKeyProvider.getStorageAccountKey(SimpleKeyProvider.java:51)
    at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AbfsConfiguration.getStorageAccountKey(AbfsConfiguration.java:577)
    at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.initializeClient(AzureBlobFileSystemStore.java:1832)
    at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.<init>(AzureBlobFileSystemStore.java:224)
    at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.initialize(AzureBlobFileSystem.java:142)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3469)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:537)
    at com.crealytics.spark.excel.WorkbookReader$.readFromHadoop$1(WorkbookReader.scala:60)
    at com.crealytics.spark.excel.WorkbookReader$.$anonfun$apply$4(WorkbookReader.scala:79)
    at com.crealytics.spark.excel.DefaultWorkbookReader.$anonfun$openWorkbook$3(WorkbookReader.scala:102)
    at scala.Option.fold(Option.scala:251)
    at com.crealytics.spark.excel.DefaultWorkbookReader.openWorkbook(WorkbookReader.scala:102)
    at com.crealytics.spark.excel.WorkbookReader.withWorkbook(WorkbookReader.scala:33)
    at com.crealytics.spark.excel.WorkbookReader.withWorkbook$(WorkbookReader.scala:32)
    at com.crealytics.spark.excel.DefaultWorkbookReader.withWorkbook(WorkbookReader.scala:87)
    at com.crealytics.spark.excel.ExcelRelation.excerpt$lzycompute(ExcelRelation.scala:48)
    at com.crealytics.spark.excel.ExcelRelation.excerpt(ExcelRelation.scala:48)
    at com.crealytics.spark.excel.ExcelRelation.headerColumns$lzycompute(ExcelRelation.scala:121)
    at com.crealytics.spark.excel.ExcelRelation.headerColumns(ExcelRelation.scala:120)
    at com.crealytics.spark.excel.ExcelRelation.$anonfun$inferSchema$1(ExcelRelation.scala:189)
    at scala.Option.getOrElse(Option.scala:189)
    at com.crealytics.spark.excel.ExcelRelation.inferSchema(ExcelRelation.scala:188)
    at com.crealytics.spark.excel.ExcelRelation.<init>(ExcelRelation.scala:52)
    at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:52)
    at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:29)
    at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:24)
    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:385)
    at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:356)
    at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:323)
    at scala.Option.getOrElse(Option.scala:189)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:323)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:236)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
    at py4j.Gateway.invoke(Gateway.java:295)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:251)
    at java.lang.Thread.run(Thread.java:748)
Caused by: Invalid configuration value detected for fs.azure.account.key
    at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.diagnostics.ConfigurationBasicValidator.validate(ConfigurationBasicValidator.java:49)
    at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.diagnostics.Base64StringConfigurationBasicValidator.validate(Base64StringConfigurationBasicValidator.java:40)
    at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.SimpleKeyProvider.validateStorageAccountKey(SimpleKeyProvider.java:70)
    at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.SimpleKeyProvider.getStorageAccountKey(SimpleKeyProvider.java:49)
    ... 42 more

Deepak_Kandpal · ‎09-27-2022

found the solution, need one additional configuration.

spark._jsc.hadoopConfiguration().set("fs.azure.account.key.<account name>.dfs.core.windows.net",'<account key>')

View solution in original post

Deepak_Kandpal · ‎09-27-2022

found the solution, need one additional configuration.

spark._jsc.hadoopConfiguration().set("fs.azure.account.key.<account name>.dfs.core.windows.net",'<account key>')

Kaniz · ‎09-29-2022

Hi @Deepak Kandpal, Thank you for sharing the solution with our community.

Mahesh_Das · ‎01-11-2024

But why to pass the account key when already Service credentials are passed. As per this post, default will be access key authentication when no other provider type is specified.

Harsha_Dbrs · 2 weeks ago

Below is the implementation of same code in scala:

spark.sparkContext.hadoopConfiguration.set("fs.azure.account.key.<accountName>.dfs.core.windows.net",<accountKey>)

Databricks

Invalid configuration value detected for fs.azure.account.key with com.crealytics:spark-excel

Building DBRX-class Custom LLMs with Mosaic AI Training

Accurate, Safe and Governed: How to Move GenAI from POC to Production

Exciting Announcement: Introducing our Learning Library!

Databricks Community Social, May 2024 - Speaker session around Training offerings

🔔 Attention Databricks Academy Users: SSO Implementation Incoming! Secure Your Account Today!