Databricks Community

zyang · ‎10-05-2022

Hi,

I am trying to write and create a delta table by enable "delta.columnMapping.mode","name", and the partition is date. But I found that when I enable this option, the partition folder name is not date any more while it is some random two letters.Any idea?

All the best,

ZQ

Debayan · ‎10-06-2022

Hi, @z yang , Could you please look into the requirements and also go through the below example to user delta column mapping? (ref: https://docs.databricks.com/delta/delta-column-mapping.html#requirements)

%sql
ALTER TABLE <table_name> SET TBLPROPERTIES (
  'delta.minReaderVersion' = '2',
  'delta.minWriterVersion' = '5',
  'delta.columnMapping.mode' = 'name'
)

Also, you can try to rename the column and check, ref: https://docs.databricks.com/delta/delta-column-mapping.html#rename-a-column.

zyang · ‎10-06-2022

Hi,

I set according to the reference. The first screenshot is reader options. Second screenshot is writer options. The partitioned folder name is random letter not the partitioned date.

Hubert-Dudek · ‎10-14-2022

As a column mapping option, common names will be seen in the metastore / table or after reading the file. Inside the file or partition folder will be mapped values generated by delta. So behavior is as expected.

zyang · ‎10-14-2022

Hi Hubert,

So it is impossible to get the partitioned folder by date if enabling columnMapping.mode? What do you mean by common names?

Hubert-Dudek · ‎10-20-2022

It is partitioned by ... mapping value which is mapped to date.

In my opinion, it would be better that mapping could be specified by column (now it is necessary to enable it for all).

ravinchi · ‎11-24-2022

@Hubert Dudek you mean is this behaviour expected that delta files gets created in a random directory (here it is 4K) in a specified path after enabling 'delta.columnMapping.mode' = 'name' ? any reason why it gets created and is there a way we can avoid it ? Please note- I'm not using any partition as such while writing.

ravinchi · ‎11-24-2022

@Kaniz Fatma @Debayan Mukherjee Please check above comment

Anonymous · ‎11-13-2022

Hi @z yang

Does @Hubert Dudek and @Kaniz Fatma 's responses answer your question?

If yes, would you be happy to mark it as best so that other members can find the solution more quickly? Else, we can help you with more details.

zyang · ‎11-24-2022

No,I don't understand

AleksAngelova · ‎04-04-2023

I am also facing the same situation. Is there a way to prevent the random subdirectories from appearing when enabling 'delta.columnMapping.mode' = 'name' ? Or is there at least an explanation why they appear? Does this affect performance?

Sandeep · ‎06-01-2023

@z yang This is expected with column mapping mode enabled in order to support it. and also Delta doesn't really need the physical folder structure for partitioning it relies on the transaction logs to read the partitions.

CkoockieMonster · ‎02-07-2024

Hello, I'm a bit late to the party, but I'll put that for posterity:

There's a way to rename your weird two letter named folders and still have your table working, but it violates the good practices guidelines suggested by Data Bricks, and I don't think you should really use it, but hey, it works for me.

- After modifying your table, you have to check the _data_log json files for lines that add directories to the table. Look for the lines that start with {"add".

- In it you should check for the "path" parameter which should contain one of your weird two letter named folder followed by the actual file that was added (ex: {"add":{"path":"Mz/part-00000-493eaa18-4b8c-78ad-907e-g213fc315643.c000.snappy.parquet")

- Rename the weird folder after the partition you wanted (ex: "path":"date=20240207/part-00000-493eaa18-4b8c-78ad-907e-g213fc315643.c000.snappy.parquet")

-Then go to your weird folder and rename it with the same name so your delta log can find it.

If you have multiple weird folders for the same partition, create your new folder first, then move the files from the weird folders to the new one (and you can delete the empty weird folders afterward).

You can automate that with a python notebook so you don't have to do all of this by hand everytime you change your table, and add the notebook at the end of your Work Flows or other notebooks to be sure to never skip any waird directories.

Now your table should be good and you can do queries. I mean it was already good, but now you can easily find your datas with your file explorer.

Once again, I'm pretty sure that's not a safe method to do that (cuz don't mess with the logs), and you probably should just be content with weirdly named folders, but if your data treatment need the right folder names, then this works just fine.