<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Extracting data from excel in datalake storage using openpyxl in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/extracting-data-from-excel-in-datalake-storage-using-openpyxl/m-p/2999#M198</link>
    <description>&lt;P&gt;i am trying to extract some data into databricks but tripping all over openpyxl, newish user of databricks..&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;from openpyxl import load_workbook&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;directory_id="hidden"&lt;/P&gt;&lt;P&gt;scope="hidden"&lt;/P&gt;&lt;P&gt;client_id="hidden"&lt;/P&gt;&lt;P&gt;service_credential_key="hidden"&lt;/P&gt;&lt;P&gt;container_name="hidden"&lt;/P&gt;&lt;P&gt;storage_account="hidden"&lt;/P&gt;&lt;P&gt;application_id = dbutils.secrets.get(scope= scope, key=client_id)&lt;/P&gt;&lt;P&gt;service_credential = dbutils.secrets.get(scope= scope, key=service_credential_key)&lt;/P&gt;&lt;P&gt;adls_path = f"abfss://{container_name}@{storage_account}.dfs.core.windows.net/"&lt;/P&gt;&lt;P&gt;spark.sql("set spark.databricks.delta.optimizeWrite.enabled = True")&lt;/P&gt;&lt;P&gt;project = "hidden/hidden/"&lt;/P&gt;&lt;P&gt;path = f"abfss://{container_name}@{storage_account}.dfs.core.windows.net/"&lt;/P&gt;&lt;P&gt;raw_path = f"{path}raw/{project}"&lt;/P&gt;&lt;P&gt;raw_schema_path = f"{raw_path}/schema/"&lt;/P&gt;&lt;P&gt;bronze_path = f"{path}bronze/{project}"&lt;/P&gt;&lt;P&gt;silver_path = f"{path}silver/{project}"&lt;/P&gt;&lt;P&gt;gold_path = f"{path}gold/{project}"&lt;/P&gt;&lt;P&gt;temp_path = f"{path}{project}"&lt;/P&gt;&lt;P&gt;print(temp_path)&lt;/P&gt;&lt;P&gt;dbutils.fs.ls(temp_path)&lt;/P&gt;&lt;P&gt;user = load_workbook(f"{temp_path}staff.xlsx")&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;FileNotFoundError: [Errno 2] No such file or directory: 'hidden'&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;it definitely does it exist and the link is correct, but not quite understanding why its not picking it up.    if i convert it to csv and use  &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;user = spark.read.format("csv").options(**{"sep":",","header":True,"inferSchema":True}).load(f"{temp_path}staff.csv")&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;it works fine, but trying to avoid the extra hop.  working with excel seems a bit painful however.  Fairly new on databricks so might be missing something obvious.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 15 Jun 2023 17:49:09 GMT</pubDate>
    <dc:creator>vanessafvg</dc:creator>
    <dc:date>2023-06-15T17:49:09Z</dc:date>
    <item>
      <title>Extracting data from excel in datalake storage using openpyxl</title>
      <link>https://community.databricks.com/t5/data-engineering/extracting-data-from-excel-in-datalake-storage-using-openpyxl/m-p/2999#M198</link>
      <description>&lt;P&gt;i am trying to extract some data into databricks but tripping all over openpyxl, newish user of databricks..&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;from openpyxl import load_workbook&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;directory_id="hidden"&lt;/P&gt;&lt;P&gt;scope="hidden"&lt;/P&gt;&lt;P&gt;client_id="hidden"&lt;/P&gt;&lt;P&gt;service_credential_key="hidden"&lt;/P&gt;&lt;P&gt;container_name="hidden"&lt;/P&gt;&lt;P&gt;storage_account="hidden"&lt;/P&gt;&lt;P&gt;application_id = dbutils.secrets.get(scope= scope, key=client_id)&lt;/P&gt;&lt;P&gt;service_credential = dbutils.secrets.get(scope= scope, key=service_credential_key)&lt;/P&gt;&lt;P&gt;adls_path = f"abfss://{container_name}@{storage_account}.dfs.core.windows.net/"&lt;/P&gt;&lt;P&gt;spark.sql("set spark.databricks.delta.optimizeWrite.enabled = True")&lt;/P&gt;&lt;P&gt;project = "hidden/hidden/"&lt;/P&gt;&lt;P&gt;path = f"abfss://{container_name}@{storage_account}.dfs.core.windows.net/"&lt;/P&gt;&lt;P&gt;raw_path = f"{path}raw/{project}"&lt;/P&gt;&lt;P&gt;raw_schema_path = f"{raw_path}/schema/"&lt;/P&gt;&lt;P&gt;bronze_path = f"{path}bronze/{project}"&lt;/P&gt;&lt;P&gt;silver_path = f"{path}silver/{project}"&lt;/P&gt;&lt;P&gt;gold_path = f"{path}gold/{project}"&lt;/P&gt;&lt;P&gt;temp_path = f"{path}{project}"&lt;/P&gt;&lt;P&gt;print(temp_path)&lt;/P&gt;&lt;P&gt;dbutils.fs.ls(temp_path)&lt;/P&gt;&lt;P&gt;user = load_workbook(f"{temp_path}staff.xlsx")&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;FileNotFoundError: [Errno 2] No such file or directory: 'hidden'&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;it definitely does it exist and the link is correct, but not quite understanding why its not picking it up.    if i convert it to csv and use  &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;user = spark.read.format("csv").options(**{"sep":",","header":True,"inferSchema":True}).load(f"{temp_path}staff.csv")&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;it works fine, but trying to avoid the extra hop.  working with excel seems a bit painful however.  Fairly new on databricks so might be missing something obvious.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 15 Jun 2023 17:49:09 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/extracting-data-from-excel-in-datalake-storage-using-openpyxl/m-p/2999#M198</guid>
      <dc:creator>vanessafvg</dc:creator>
      <dc:date>2023-06-15T17:49:09Z</dc:date>
    </item>
    <item>
      <title>Re: Extracting data from excel in datalake storage using openpyxl</title>
      <link>https://community.databricks.com/t5/data-engineering/extracting-data-from-excel-in-datalake-storage-using-openpyxl/m-p/3000#M199</link>
      <description>&lt;P&gt;Hi @Vanessa Van Gelder​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Great to meet you, and thanks for your question! &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Let's see if your peers in the community have an answer to your question. Thanks.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 17 Jun 2023 09:30:22 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/extracting-data-from-excel-in-datalake-storage-using-openpyxl/m-p/3000#M199</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2023-06-17T09:30:22Z</dc:date>
    </item>
  </channel>
</rss>

