cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Illegal character in partition path when attempting REORG ... (PURGE)

bearys
New Contributor II

I have a large delta table partitioned by an identifier column that I now have discovered has blank spaces in some of the identifiers, e.g. one partition can be defined by "Identifier=first identifier". Most partitions does not have these blank spaces in the identifiers, and it hasn't been a problem until now when I want to use

REORG TABLE table_name APPLY (PURGE)

to rewrite the files and get rid of some recently deleted columns.

When running REORG, I get

Error in SQL statement: SparkException: Job aborted due to stage failure: ... java.net.URISyntaxException: Illegal character in path at index ...

pointing to that blank space in the path "dbfs:/mnt/container/table_name/Identifier=first identifier/part-01347-8a9a157b-6d0d-75dd-b1b7-2aed12e057db.c000.snappy.parquet".

Note that this has not been an issue when running OPTIMIZE on the same partition.

Anyone know how I can solve this? The only thing I can think of to move forward is to exclude the problematic partitions from the REORG, but that's a workaround, not a solution. Any tips on an actual solution much appreciated 🙏

2 REPLIES 2

bearys
New Contributor II

FYI similar issue with partitions with "%" in the identifier. Used the filter clause of the REORG to exclude partitions with " " or "%" to be able to move forward with my work but will continue looking for a solution.

I've never seen any pointers not to use strings with blank spaces or percent signs as partition columns. Might this issue be a bug?

Kaniz_Fatma
Community Manager
Community Manager

Hi @bearysThe error message suggests an illegal character in the path at a specific index.

The error is pointing to a blank space in the path "dbfs:/mnt/container/table_name/Identifier=first identifier/part-01347-8a9a157b-6d0d-75dd-b1b7-2aed12e057db.c000.snappy.parquet".

This error can occur due to special characters in the path. To resolve this issue, you can try replacing the blank space in the path with an underscore or removing the special characters from the path. Alternatively, you can try URL encoding the path to replace special characters with their corresponding escape sequences.