<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Illegal character in partition path when attempting REORG ... (PURGE) in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/illegal-character-in-partition-path-when-attempting-reorg-purge/m-p/13201#M7915</link>
    <description>&lt;P&gt;I have a large delta table partitioned by an identifier column that I now have discovered has blank spaces in some of the identifiers, e.g. one partition can be defined by "Identifier=first identifier". Most partitions does not have these blank spaces in the identifiers, and it hasn't been a problem until now when I want to use &lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;REORG TABLE table_name APPLY (PURGE)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;to rewrite the files and get rid of some recently deleted columns. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;When running REORG, I get &lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;Error in SQL statement: SparkException: Job aborted due to stage failure: ... java.net.URISyntaxException: Illegal character in path at index ...&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;pointing to that blank space in the path "&lt;I&gt;dbfs:/mnt/container/table_name/Identifier=&lt;/I&gt;&lt;B&gt;&lt;I&gt;first identifier&lt;/I&gt;&lt;/B&gt;&lt;I&gt;/part-01347-8a9a157b-6d0d-75dd-b1b7-2aed12e057db.c000.snappy.parquet&lt;/I&gt;".&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Note that this has not been an issue when running OPTIMIZE on the same partition.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Anyone know how I can solve this? The only thing I can think of to move forward is to exclude the problematic partitions from the REORG, but that's a workaround, not a solution. Any tips on an actual solution much appreciated &lt;span class="lia-unicode-emoji" title=":folded_hands:"&gt;🙏&lt;/span&gt; &lt;/P&gt;</description>
    <pubDate>Mon, 18 Jul 2022 12:04:18 GMT</pubDate>
    <dc:creator>bearys</dc:creator>
    <dc:date>2022-07-18T12:04:18Z</dc:date>
    <item>
      <title>Illegal character in partition path when attempting REORG ... (PURGE)</title>
      <link>https://community.databricks.com/t5/data-engineering/illegal-character-in-partition-path-when-attempting-reorg-purge/m-p/13201#M7915</link>
      <description>&lt;P&gt;I have a large delta table partitioned by an identifier column that I now have discovered has blank spaces in some of the identifiers, e.g. one partition can be defined by "Identifier=first identifier". Most partitions does not have these blank spaces in the identifiers, and it hasn't been a problem until now when I want to use &lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;REORG TABLE table_name APPLY (PURGE)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;to rewrite the files and get rid of some recently deleted columns. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;When running REORG, I get &lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;Error in SQL statement: SparkException: Job aborted due to stage failure: ... java.net.URISyntaxException: Illegal character in path at index ...&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;pointing to that blank space in the path "&lt;I&gt;dbfs:/mnt/container/table_name/Identifier=&lt;/I&gt;&lt;B&gt;&lt;I&gt;first identifier&lt;/I&gt;&lt;/B&gt;&lt;I&gt;/part-01347-8a9a157b-6d0d-75dd-b1b7-2aed12e057db.c000.snappy.parquet&lt;/I&gt;".&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Note that this has not been an issue when running OPTIMIZE on the same partition.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Anyone know how I can solve this? The only thing I can think of to move forward is to exclude the problematic partitions from the REORG, but that's a workaround, not a solution. Any tips on an actual solution much appreciated &lt;span class="lia-unicode-emoji" title=":folded_hands:"&gt;🙏&lt;/span&gt; &lt;/P&gt;</description>
      <pubDate>Mon, 18 Jul 2022 12:04:18 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/illegal-character-in-partition-path-when-attempting-reorg-purge/m-p/13201#M7915</guid>
      <dc:creator>bearys</dc:creator>
      <dc:date>2022-07-18T12:04:18Z</dc:date>
    </item>
    <item>
      <title>Re: Illegal character in partition path when attempting REORG ... (PURGE)</title>
      <link>https://community.databricks.com/t5/data-engineering/illegal-character-in-partition-path-when-attempting-reorg-purge/m-p/13202#M7916</link>
      <description>&lt;P&gt;FYI similar issue with partitions with "%" in the identifier. Used the filter clause of the REORG to exclude partitions with " " or "%" to be able to move forward with my work but will continue looking for a solution. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I've never seen any pointers not to use strings with blank spaces or percent signs as partition columns. Might this issue be a bug?&lt;/P&gt;</description>
      <pubDate>Tue, 19 Jul 2022 15:07:27 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/illegal-character-in-partition-path-when-attempting-reorg-purge/m-p/13202#M7916</guid>
      <dc:creator>bearys</dc:creator>
      <dc:date>2022-07-19T15:07:27Z</dc:date>
    </item>
  </channel>
</rss>

