I have a large delta table partitioned by an identifier column that I now have discovered has blank spaces in some of the identifiers, e.g. one partition can be defined by "Identifier=first identifier". Most partitions does not have these blank spaces in the identifiers, and it hasn't been a problem until now when I want to use
REORG TABLE table_name APPLY (PURGE)
to rewrite the files and get rid of some recently deleted columns.
When running REORG, I get
Error in SQL statement: SparkException: Job aborted due to stage failure: ... java.net.URISyntaxException: Illegal character in path at index ...
pointing to that blank space in the path "dbfs:/mnt/container/table_name/Identifier=first identifier/part-01347-8a9a157b-6d0d-75dd-b1b7-2aed12e057db.c000.snappy.parquet".
Note that this has not been an issue when running OPTIMIZE on the same partition.
Anyone know how I can solve this? The only thing I can think of to move forward is to exclude the problematic partitions from the REORG, but that's a workaround, not a solution. Any tips on an actual solution much appreciated 🙏