cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Option "delta.columnMapping.mode","name" introduces unexpected result

zyang
Contributor

Hi,

I am trying to write and create a delta table by enable "delta.columnMapping.mode","name", and the partition is date. But I found that when I enable this option, the partition folder name is not date any more while it is some random two letters.imageAny idea?

All the best,

ZQ

12 REPLIES 12

Debayan
Databricks Employee
Databricks Employee

Hi, @z yangโ€‹ , Could you please look into the requirements and also go through the below example to user delta column mapping? (ref: https://docs.databricks.com/delta/delta-column-mapping.html#requirements)

%sql
ALTER TABLE <table_name> SET TBLPROPERTIES (
  'delta.minReaderVersion' = '2',
  'delta.minWriterVersion' = '5',
  'delta.columnMapping.mode' = 'name'
)

Also, you can try to rename the column and check, ref: https://docs.databricks.com/delta/delta-column-mapping.html#rename-a-column.

Hi,

I set according to the reference. The first screenshot is reader options. Second screenshot is writer options. The partitioned folder name is random letter not the partitioned date.

imageimage

Hubert-Dudek
Esteemed Contributor III

As a column mapping option, common names will be seen in the metastore / table or after reading the file. Inside the file or partition folder will be mapped values generated by delta. So behavior is as expected.

Hi Hubert,

So it is impossible to get the partitioned folder by date if enabling columnMapping.mode? What do you mean by common names?

Hubert-Dudek
Esteemed Contributor III

It is partitioned by ... mapping value which is mapped to date.

In my opinion, it would be better that mapping could be specified by column (now it is necessary to enable it for all).

ravinchi
New Contributor III

@Hubert Dudekโ€‹  you mean is this behaviour expected that delta files gets created in a random directory (here it is 4K) in a specified path after enabling 'delta.columnMapping.mode' = 'name' ? any reason why it gets created and is there a way we can avoid it ? Please note- I'm not using any partition as such while writing.

image

ravinchi
New Contributor III

@Kaniz Fatmaโ€‹ @Debayan Mukherjeeโ€‹ Please check above comment

Anonymous
Not applicable

Hi @z yangโ€‹ 

Does @Hubert Dudekโ€‹  and @Kaniz Fatmaโ€‹ 's responses answer your question?

If yes, would you be happy to mark it as best so that other members can find the solution more quickly? Else, we can help you with more details.

No,I don't understand

aleks1601
New Contributor II

I am also facing the same situation. Is there a way to prevent the random subdirectories from appearing when enabling 'delta.columnMapping.mode' = 'name' ? Or is there at least an explanation why they appear? Does this affect performance?

Sandeep
Contributor III

@z yangโ€‹ This is expected with column mapping mode enabled in order to support it. and also Delta doesn't really need the physical folder structure for partitioning it relies on the transaction logs to read the partitions.

CkoockieMonster
New Contributor II

Hello, I'm a bit late to the party, but I'll put that for posterity:

There's a way to rename your weird two letter named folders and still have your table working, but it violates the good practices guidelines suggested by Data Bricks, and I don't think you should really use it, but hey, it works for me.

- After modifying your table, you have to check the _data_log json files for lines that add directories to the table. Look for the lines that start with {"add".

- In it you should check for the "path" parameter which should contain one of your weird two letter named folder followed by the actual file that was added (ex: {"add":{"path":"Mz/part-00000-493eaa18-4b8c-78ad-907e-g213fc315643.c000.snappy.parquet")

- Rename the  weird folder after the partition you wanted (ex: "path":"date=20240207/part-00000-493eaa18-4b8c-78ad-907e-g213fc315643.c000.snappy.parquet")

-Then go to your weird folder and rename it with the same name so your delta log can find it.

If you have multiple weird folders for the same partition, create your new folder first, then move the files from the weird folders to the new one (and you can delete the empty weird folders afterward).

You can automate that with a python notebook so you don't have to do all of this by hand everytime you change your table, and add the notebook at the end of your Work Flows or other notebooks to be sure to never skip any waird directories.

Now your table should be good and you can do queries. I mean it was already good, but now you can easily find your datas with your file explorer.

Once again, I'm pretty sure that's not a safe method to do that (cuz don't mess with the logs), and you probably should just be content with weirdly named folders, but if your data treatment need the right folder names, then this works just fine.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group