<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Using spark.read.excel – dataAddress with only start cell is not working (DBR 17.x) in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/using-spark-read-excel-dataaddress-with-only-start-cell-is-not/m-p/148766#M52965</link>
    <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/22314"&gt;@saurabh18cs&lt;/a&gt;&amp;nbsp;Thanks for the response. My data header row is located at A7. Reading full sheet does not help unfortunately.&lt;/P&gt;</description>
    <pubDate>Thu, 19 Feb 2026 04:08:42 GMT</pubDate>
    <dc:creator>DhivyaKeerthana</dc:creator>
    <dc:date>2026-02-19T04:08:42Z</dc:date>
    <item>
      <title>Using spark.read.excel – dataAddress with only start cell is not working (DBR 17.x)</title>
      <link>https://community.databricks.com/t5/data-engineering/using-spark-read-excel-dataaddress-with-only-start-cell-is-not/m-p/148568#M52922</link>
      <description>&lt;DIV&gt;&lt;SPAN&gt;Hi, h&lt;/SPAN&gt;&lt;SPAN&gt;as anyone successfully used the Databricks Runtime 17.x &lt;/SPAN&gt;&lt;STRONG&gt;native Excel reader&lt;/STRONG&gt;&lt;SPAN&gt; with a &lt;/SPAN&gt;&lt;STRONG&gt;dataAddress containing only a start cell&lt;/STRONG&gt;&lt;SPAN&gt; (no end cell)? Even in the documentation, it is not specified (&lt;/SPAN&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/query/formats/excel" target="_blank" rel="noopener"&gt;https://learn.microsoft.com/en-us/azure/databricks/query/formats/excel&lt;/A&gt;&lt;SPAN&gt;).&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;FONT color="#FF0000"&gt;&lt;STRONG&gt;Below code is not working&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;df &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; spark.read.&lt;/SPAN&gt;&lt;SPAN&gt;format&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"excel"&lt;/SPAN&gt;&lt;SPAN&gt;).&lt;/SPAN&gt;&lt;SPAN&gt;option&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"headerRows"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;1&lt;/SPAN&gt;&lt;SPAN&gt;).&lt;/SPAN&gt;&lt;SPAN&gt;option&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"dataAddress"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"Sheet1!A7"&lt;/SPAN&gt;&lt;SPAN&gt;).&lt;/SPAN&gt;&lt;SPAN&gt;load&lt;/SPAN&gt;&lt;SPAN&gt;(filepath&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;In most scenarios, we don’t know the end cell address because both the number of rows and columns change for every file. How is the native Excel reader expected to help in these cases if dataAddress with only a start address (e.g.&amp;nbsp;Sheet1!A7) does not automatically expand to the bottom‑right non‑empty cell?&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Tue, 17 Feb 2026 07:00:05 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/using-spark-read-excel-dataaddress-with-only-start-cell-is-not/m-p/148568#M52922</guid>
      <dc:creator>DhivyaKeerthana</dc:creator>
      <dc:date>2026-02-17T07:00:05Z</dc:date>
    </item>
    <item>
      <title>Re: Using spark.read.excel – dataAddress with only start cell is not working (DBR 17.x)</title>
      <link>https://community.databricks.com/t5/data-engineering/using-spark-read-excel-dataaddress-with-only-start-cell-is-not/m-p/148594#M52928</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/147888"&gt;@DhivyaKeerthana&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;No — Databricks Runtime 17.x &lt;EM&gt;does not&lt;/EM&gt; support dataAddress with only a start cell (e.g., "Sheet1!A7"). how about full sheet reference like&amp;nbsp;&lt;DIV&gt;&lt;STRONG&gt;Full sheet reference&lt;/STRONG&gt; ("Sheet1")&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;Br&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Tue, 17 Feb 2026 11:19:07 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/using-spark-read-excel-dataaddress-with-only-start-cell-is-not/m-p/148594#M52928</guid>
      <dc:creator>saurabh18cs</dc:creator>
      <dc:date>2026-02-17T11:19:07Z</dc:date>
    </item>
    <item>
      <title>Re: Using spark.read.excel – dataAddress with only start cell is not working (DBR 17.x)</title>
      <link>https://community.databricks.com/t5/data-engineering/using-spark-read-excel-dataaddress-with-only-start-cell-is-not/m-p/148766#M52965</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/22314"&gt;@saurabh18cs&lt;/a&gt;&amp;nbsp;Thanks for the response. My data header row is located at A7. Reading full sheet does not help unfortunately.&lt;/P&gt;</description>
      <pubDate>Thu, 19 Feb 2026 04:08:42 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/using-spark-read-excel-dataaddress-with-only-start-cell-is-not/m-p/148766#M52965</guid>
      <dc:creator>DhivyaKeerthana</dc:creator>
      <dc:date>2026-02-19T04:08:42Z</dc:date>
    </item>
    <item>
      <title>Re: Using spark.read.excel – dataAddress with only start cell is not working (DBR 17.x)</title>
      <link>https://community.databricks.com/t5/data-engineering/using-spark-read-excel-dataaddress-with-only-start-cell-is-not/m-p/150131#M53260</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/147888"&gt;@DhivyaKeerthana&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;Welcome to the community! You are correct that the native Excel reader (DBR 17.x) does not currently support a start-cell-only dataAddress like "Sheet1!A7". The documented dataAddress formats are:&lt;/P&gt;
&lt;P&gt;- A full range: "Sheet1!C5:H10"&lt;BR /&gt;- A sheet name only: "Sheet1"&lt;BR /&gt;- Omitted entirely (reads all data from the first sheet)&lt;/P&gt;
&lt;P&gt;Since your header row starts at A7 and the number of rows and columns varies per file, here are a couple of approaches that should work well for you.&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;OPTION 1: USE A LARGE BOUNDING RANGE&lt;/P&gt;
&lt;P&gt;You can specify a range with a deliberately large end cell. The reader will only read up to the bottom-right non-empty cell within that range, so empty cells beyond your data will not cause problems:&lt;/P&gt;
&lt;P&gt;df = (spark.read.format("excel")&lt;BR /&gt;.option("headerRows", 1)&lt;BR /&gt;.option("dataAddress", "Sheet1!A7:ZZ1000000")&lt;BR /&gt;.load(filepath))&lt;/P&gt;
&lt;P&gt;This tells the reader to start at A7 and look for data up to column ZZ, row 1,000,000. In practice it will stop at the last non-empty cell, so the variable row/column count is handled automatically. This is the simplest approach and should work directly with your use case.&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;OPTION 2: READ THE FULL SHEET AND SKIP ROWS PROGRAMMATICALLY&lt;/P&gt;
&lt;P&gt;If you prefer more control, you can read the entire sheet without headerRows, then trim the leading rows in PySpark:&lt;/P&gt;
&lt;P&gt;# Read the entire sheet with no headers (all values as strings)&lt;BR /&gt;raw_df = (spark.read.format("excel")&lt;BR /&gt;.option("headerRows", 0)&lt;BR /&gt;.option("dataAddress", "Sheet1")&lt;BR /&gt;.load(filepath))&lt;/P&gt;
&lt;P&gt;# Skip the first 6 rows (rows before your header at row 7)&lt;BR /&gt;# Row indices are 0-based after reading&lt;BR /&gt;from pyspark.sql import functions as F&lt;BR /&gt;from pyspark.sql.window import Window&lt;/P&gt;
&lt;P&gt;w = Window.orderBy(F.monotonically_increasing_id())&lt;BR /&gt;indexed_df = raw_df.withColumn("_row_idx", F.row_number().over(w))&lt;/P&gt;
&lt;P&gt;# Row 7 in Excel = row index 7 after adding row numbers (first 6 rows are before your header)&lt;BR /&gt;data_df = indexed_df.filter(F.col("_row_idx") &amp;gt; 6).drop("_row_idx")&lt;/P&gt;
&lt;P&gt;# Use the first remaining row as header&lt;BR /&gt;header = data_df.first()&lt;BR /&gt;new_columns = [str(header[i]) for i in range(len(header))]&lt;BR /&gt;data_df = data_df.filter(F.col("_c0") != str(header[0]))&lt;BR /&gt;for i, col_name in enumerate(new_columns):&lt;BR /&gt;data_df = data_df.withColumnRenamed(f"_c{i}", col_name)&lt;/P&gt;
&lt;P&gt;This is more code but gives you full control over row skipping logic.&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;RECOMMENDED APPROACH&lt;/P&gt;
&lt;P&gt;Option 1 (the large bounding range) is the cleanest solution and aligns with how the native reader works. The documentation confirms that the parser reads from the specified start cell to the "bottom-right non-empty cell," so providing a generous end boundary is a safe and effective pattern.&lt;/P&gt;
&lt;P&gt;For reference, here is the documentation page for the native Excel reader:&lt;BR /&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/query/formats/excel" target="_blank"&gt;https://learn.microsoft.com/en-us/azure/databricks/query/formats/excel&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Hope this helps you move forward. Let us know how it goes!&lt;/P&gt;
&lt;P&gt;* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.&lt;/P&gt;</description>
      <pubDate>Sun, 08 Mar 2026 04:42:00 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/using-spark-read-excel-dataaddress-with-only-start-cell-is-not/m-p/150131#M53260</guid>
      <dc:creator>SteveOstrowski</dc:creator>
      <dc:date>2026-03-08T04:42:00Z</dc:date>
    </item>
    <item>
      <title>Re: Using spark.read.excel – dataAddress with only start cell is not working (DBR 17.x)</title>
      <link>https://community.databricks.com/t5/data-engineering/using-spark-read-excel-dataaddress-with-only-start-cell-is-not/m-p/151771#M53707</link>
      <description>&lt;P&gt;Thanks&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/133188"&gt;@SteveOstrowski&lt;/a&gt;&amp;nbsp;for the response. Yes I am using&amp;nbsp;&lt;STRONG&gt;OPTION 1: USE A LARGE BOUNDING RANGE&lt;/STRONG&gt; as a workaround.&lt;/P&gt;</description>
      <pubDate>Tue, 24 Mar 2026 06:16:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/using-spark-read-excel-dataaddress-with-only-start-cell-is-not/m-p/151771#M53707</guid>
      <dc:creator>DhivyaKeerthana</dc:creator>
      <dc:date>2026-03-24T06:16:36Z</dc:date>
    </item>
  </channel>
</rss>

