Databricks Community

Gary_Irick · ‎09-13-2022

I recently created a table on a cluster in Azure running Databricks Runtime 11.1. The table is partitioned by a "date" column. I enabled column mapping, like this:

ALTER TABLE {schema}.{table_name} SET TBLPROPERTIES('delta.columnMapping.mode' = 'name', 'delta.minReaderVersion' = '2', 'delta.minWriterVersion' = '5')

Before enabling column mapping, the directory containing the Delta table has the expected partition directories: "date=2022-08-18", "date=2022-08-19", etc.

After enabling column mapping, every time I do a MERGE into that table, I get new directories created with short names like "5k", "Rw", "Yd", etc. When I VACUUM the table, most of the directories are empty, but the empty directories are not removed. We merge into this table frequently, so the table containing the Delta table ends up with lots and lots of empty directories.

I have 2 questions:

Is it expected that these directories will be created with names other than the expected "date=2022-08-18"?

Is there a way to make VACUUM remove the empty directories?

I could write code to walk through the Delta table directory and remove the empty directories, but I would rather not touch those directories! That's for Databricks to manage, and I don't want to step in its way.

Thanks in advance for any information you can provide.

Debayan · ‎09-15-2022

Hi, For removing files or directories using VACUUM , you can refer https://docs.databricks.com/delta/delta-utility.html#remove-files-no-longer-referenced-by-a-delta-ta...

As far as I know, the dates will be the default naming syntax, which can be renamed.

Anonymous · ‎09-27-2022

Hi @Gary Irick

Does @Debayan Mukherjee response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?

We'd love to hear from you.

Thanks!

gongasxavi · ‎12-16-2022

The same is happening with me. Since enabling column mapping, the new records are stored in folders with random names instead of being stored in its partition folder

Pete_Cotton · ‎01-03-2023

Same issue is happening with me too since enabling column mapping. Files are stored in folders with random 2 character names (0P, 3h, BB) rather than the date value of the load_date partition column (load_date=2023-01-01, load_date=2023-01-02).

Have tried using databricks runtime 12.0 but get the same result when performing an append or merge operation. Has anyone been able to resolve this yet?

AleksAngelova · ‎04-04-2023

Is there at least an explanation why this is happening and whether it affects performance?

nan · ‎07-12-2023

seen the same behavior. waiting for some explanation.

Tharun-Kumar · ‎07-12-2023

@Gary_Irick @Pete_Cotton
This is expected. Enabling column mapping enables random file prefixes, which removes the ability to explore data using Hive-style partitioning.

This is also documented here - https://docs.databricks.com/delta/delta-column-mapping.html#:~:text=Enabling%20column%20mapping%20al....

vascoa · ‎11-20-2023

Same is happening to me and very frustrating as it irreversibly breaks our process.

talenik · ‎07-30-2024

Hi @Retired_mod ,

I have few queries on Directory Names with Column Mapping. I have this delta table on ADLS and I am trying to read it, but I am getting below error. How can we read delta tables with column mapping enabled with pyspark?

Can you please help.

A partition path fragment should be the form like `part1=foo/part2=bar`. The partition path: {{delta table name}}

Edit:

I was able to read tables as is. Maybe some issue with delta version

Regards,

Nikhil