10-05-2022 05:35 AM
Hi,
I am trying to write and create a delta table by enable "delta.columnMapping.mode","name", and the partition is date. But I found that when I enable this option, the partition folder name is not date any more while it is some random two letters.Any idea?
All the best,
ZQ
10-06-2022 01:03 AM
Hi, @z yang , Could you please look into the requirements and also go through the below example to user delta column mapping? (ref: https://docs.databricks.com/delta/delta-column-mapping.html#requirements)
%sql
ALTER TABLE <table_name> SET TBLPROPERTIES (
'delta.minReaderVersion' = '2',
'delta.minWriterVersion' = '5',
'delta.columnMapping.mode' = 'name'
)
Also, you can try to rename the column and check, ref: https://docs.databricks.com/delta/delta-column-mapping.html#rename-a-column.
10-06-2022 03:55 AM
10-14-2022 04:18 AM
As a column mapping option, common names will be seen in the metastore / table or after reading the file. Inside the file or partition folder will be mapped values generated by delta. So behavior is as expected.
10-14-2022 05:27 AM
Hi Hubert,
So it is impossible to get the partitioned folder by date if enabling columnMapping.mode? What do you mean by common names?
10-20-2022 05:09 AM
It is partitioned by ... mapping value which is mapped to date.
In my opinion, it would be better that mapping could be specified by column (now it is necessary to enable it for all).
11-24-2022 01:08 AM
@Hubert Dudek you mean is this behaviour expected that delta files gets created in a random directory (here it is 4K) in a specified path after enabling 'delta.columnMapping.mode' = 'name' ? any reason why it gets created and is there a way we can avoid it ? Please note- I'm not using any partition as such while writing.
11-24-2022 02:10 AM
@Kaniz Fatma @Debayan Mukherjee Please check above comment
11-13-2022 07:59 PM
Hi @z yang
Does @Hubert Dudek and @Kaniz Fatma 's responses answer your question?
If yes, would you be happy to mark it as best so that other members can find the solution more quickly? Else, we can help you with more details.
11-24-2022 10:55 PM
No,I don't understand
04-04-2023 04:42 AM
I am also facing the same situation. Is there a way to prevent the random subdirectories from appearing when enabling 'delta.columnMapping.mode' = 'name' ? Or is there at least an explanation why they appear? Does this affect performance?
06-01-2023 02:46 AM
@z yang This is expected with column mapping mode enabled in order to support it. and also Delta doesn't really need the physical folder structure for partitioning it relies on the transaction logs to read the partitions.
02-07-2024 05:24 AM
Hello, I'm a bit late to the party, but I'll put that for posterity:
There's a way to rename your weird two letter named folders and still have your table working, but it violates the good practices guidelines suggested by Data Bricks, and I don't think you should really use it, but hey, it works for me.
- After modifying your table, you have to check the _data_log json files for lines that add directories to the table. Look for the lines that start with {"add".
- In it you should check for the "path" parameter which should contain one of your weird two letter named folder followed by the actual file that was added (ex: {"add":{"path":"Mz/part-00000-493eaa18-4b8c-78ad-907e-g213fc315643.c000.snappy.parquet")
- Rename the weird folder after the partition you wanted (ex: "path":"date=20240207/part-00000-493eaa18-4b8c-78ad-907e-g213fc315643.c000.snappy.parquet")
-Then go to your weird folder and rename it with the same name so your delta log can find it.
If you have multiple weird folders for the same partition, create your new folder first, then move the files from the weird folders to the new one (and you can delete the empty weird folders afterward).
You can automate that with a python notebook so you don't have to do all of this by hand everytime you change your table, and add the notebook at the end of your Work Flows or other notebooks to be sure to never skip any waird directories.
Now your table should be good and you can do queries. I mean it was already good, but now you can easily find your datas with your file explorer.
Once again, I'm pretty sure that's not a safe method to do that (cuz don't mess with the logs), and you probably should just be content with weirdly named folders, but if your data treatment need the right folder names, then this works just fine.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group