<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: RDD not picking up spark configuration for azure storage account access in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/rdd-not-picking-up-spark-configuration-for-azure-storage-account/m-p/31699#M23084</link>
    <description>&lt;P&gt;I decided to load the files into a DataFrame with a single column and then do the processing before splitting it into separate columns and this works just fine.&lt;/P&gt;&lt;P&gt;@Hyper Guy​&amp;nbsp;thanks for the link, I didn't try that but it seems like it would resolve the issue.&lt;/P&gt;</description>
    <pubDate>Wed, 28 Sep 2022 07:27:23 GMT</pubDate>
    <dc:creator>Leo_138525</dc:creator>
    <dc:date>2022-09-28T07:27:23Z</dc:date>
    <item>
      <title>RDD not picking up spark configuration for azure storage account access</title>
      <link>https://community.databricks.com/t5/data-engineering/rdd-not-picking-up-spark-configuration-for-azure-storage-account/m-p/31696#M23081</link>
      <description>&lt;P&gt;I want to open some CSV files as an RDD, do some processing and then load it as a DataFrame. Since the files are stored in an Azure blob storage account I need to configure the access accordingly, which for some reason does not work when using an RDD. So I configure the access this way:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;spark.conf.set("fs.azure.account.auth.type.&amp;lt;storage-account&amp;gt;.dfs.core.windows.net", "OAuth")
spark.conf.set("fs.azure.account.oauth.provider.type.&amp;lt;storage-account&amp;gt;.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set("fs.azure.account.oauth2.client.id.&amp;lt;storage-account&amp;gt;.dfs.core.windows.net", "&amp;lt;application-id&amp;gt;")
spark.conf.set("fs.azure.account.oauth2.client.secret.&amp;lt;storage-account&amp;gt;.dfs.core.windows.net", service_credential)
spark.conf.set("fs.azure.account.oauth2.client.endpoint.&amp;lt;storage-account&amp;gt;.dfs.core.windows.net", "https://login.microsoftonline.com/&amp;lt;directory-id&amp;gt;/oauth2/token")&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;This works when loading the files directly as a DataFrame, but not when using the RDD API:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;# This works with the previously set configuration
df = spark.read.format('csv').load('abfss://some/path/file.csv')
&amp;nbsp;
# This does not work and an error is thrown
rdd = spark.sparkContext.textFile('abfss://some/path/file.csv')
df = rdd.filter(filter_func).map(map_func).toDF()&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;The error I get is:&lt;/P&gt;&lt;P&gt;&lt;I&gt;Failure to initialize configurationInvalid configuration value detected for fs.azure.account.key&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Why does the access configuration work when loading the files directly and not via a RDD? And how do I solve this problem?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 15 Sep 2022 07:38:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/rdd-not-picking-up-spark-configuration-for-azure-storage-account/m-p/31696#M23081</guid>
      <dc:creator>Leo_138525</dc:creator>
      <dc:date>2022-09-15T07:38:20Z</dc:date>
    </item>
    <item>
      <title>Re: RDD not picking up spark configuration for azure storage account access</title>
      <link>https://community.databricks.com/t5/data-engineering/rdd-not-picking-up-spark-configuration-for-azure-storage-account/m-p/31697#M23082</link>
      <description>&lt;P&gt;Hello!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I got the same error a few days ago and I resolved it with this post that I found:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://www.data-engineering.wiki/docs/spark/accessing-adls-gen-2-with-rdd/" alt="https://www.data-engineering.wiki/docs/spark/accessing-adls-gen-2-with-rdd/" target="_blank"&gt;&lt;B&gt;Accessing ADLS Gen 2 with RDD | Data Engineering (data-engineering.wiki)&lt;/B&gt;&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Basically, the key is to setup the properties to "hadoop" using &lt;B&gt;spark.sparkContext.hadoopConfiguration.set(...)&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I hope you solve your problem!&lt;/P&gt;</description>
      <pubDate>Thu, 22 Sep 2022 13:28:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/rdd-not-picking-up-spark-configuration-for-azure-storage-account/m-p/31697#M23082</guid>
      <dc:creator>data-guy</dc:creator>
      <dc:date>2022-09-22T13:28:36Z</dc:date>
    </item>
    <item>
      <title>Re: RDD not picking up spark configuration for azure storage account access</title>
      <link>https://community.databricks.com/t5/data-engineering/rdd-not-picking-up-spark-configuration-for-azure-storage-account/m-p/31698#M23083</link>
      <description>&lt;P&gt;@Leo Baudrexel​&amp;nbsp;- could you please check if the service principal is having the correct permissions to access the storage account? &lt;/P&gt;&lt;P&gt;please make sure that service principal is having the "contributor" or "storage blob data contributor" role on the storage account.&lt;/P&gt;</description>
      <pubDate>Tue, 27 Sep 2022 16:23:12 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/rdd-not-picking-up-spark-configuration-for-azure-storage-account/m-p/31698#M23083</guid>
      <dc:creator>shan_chandra</dc:creator>
      <dc:date>2022-09-27T16:23:12Z</dc:date>
    </item>
    <item>
      <title>Re: RDD not picking up spark configuration for azure storage account access</title>
      <link>https://community.databricks.com/t5/data-engineering/rdd-not-picking-up-spark-configuration-for-azure-storage-account/m-p/31699#M23084</link>
      <description>&lt;P&gt;I decided to load the files into a DataFrame with a single column and then do the processing before splitting it into separate columns and this works just fine.&lt;/P&gt;&lt;P&gt;@Hyper Guy​&amp;nbsp;thanks for the link, I didn't try that but it seems like it would resolve the issue.&lt;/P&gt;</description>
      <pubDate>Wed, 28 Sep 2022 07:27:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/rdd-not-picking-up-spark-configuration-for-azure-storage-account/m-p/31699#M23084</guid>
      <dc:creator>Leo_138525</dc:creator>
      <dc:date>2022-09-28T07:27:23Z</dc:date>
    </item>
    <item>
      <title>Re: RDD not picking up spark configuration for azure storage account access</title>
      <link>https://community.databricks.com/t5/data-engineering/rdd-not-picking-up-spark-configuration-for-azure-storage-account/m-p/46331#M28045</link>
      <description>&lt;P&gt;Is there an &lt;EM&gt;explanation&lt;/EM&gt; for &lt;EM&gt;why&lt;/EM&gt; this behavior has changed?&lt;/P&gt;&lt;P&gt;In the past on Azure Databricks, one could add to the&amp;nbsp;&lt;EM&gt;Spark config&lt;/EM&gt; in the&amp;nbsp;&lt;EM&gt;Advanced options&lt;/EM&gt; of a cluster's&amp;nbsp;&lt;EM&gt;Configuration&lt;/EM&gt; tab a configuration parameter like:&lt;/P&gt;&lt;P class="lia-indent-padding-left-30px"&gt;&lt;SPAN&gt;fs.azure.account.key.BLOB_CONTAINER_NAME.dfs.core.windows.net &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;and the value of a suitable ADLS Gen 2 account key and RDDs would just work without one having to call configuration setting methods on the SparkContext associated with the Spark session in a job or notebook?&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 26 Sep 2023 20:41:50 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/rdd-not-picking-up-spark-configuration-for-azure-storage-account/m-p/46331#M28045</guid>
      <dc:creator>JerryK</dc:creator>
      <dc:date>2023-09-26T20:41:50Z</dc:date>
    </item>
  </channel>
</rss>

