Spark SQL INITCAP not capitalizing letters after periods in abbreviations
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-25-2025 01:19 AM
Using SELECT INITCAP("text (e.g., text, text, etc.)") abbreviations with periods like e.g. are not being fully capitalized.
Current behavior:
Input: "text (e.g., text, text, etc.)"Output: "Text (e.g., Text, Text, Etc.)"
Expected behavior:
Output: "Text (E.G., Text, Text, Etc.)"Version:
16.4.x-scala2.12
- Labels:
-
Spark
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-25-2025 02:28 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-25-2025 02:48 AM
Thanks for suggestion.
Good option, but few concerns here: there is no space in my original example for e.g. so it would require more sophisticated regex or custom udf.
I think that the root cause of the issue is some limitation of initcap function, that treats only spaces as delimiter. Also, I've tried such query in EKS cluster, and it works as expected, so this is rather some limitation of Databricks Spark version
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-25-2025 02:55 AM
My solution is indeed a workaround. INITCAP is behaving as you comment. You can include another regular expression at the beginning to remove non-original "spaces" but I agree that makes it a little complex. However, no other solution so far I'm aware of
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-26-2025 10:06 AM
Yes similar to what @Coffee77 has told, you can alternatively create an SQL function and use it directly with the custom logic using the regexp:
CREATE OR REPLACE FUNCTION PROPER_WITH_ABBREVIATIONS(input STRING)
RETURNS STRING
RETURN regexp_replace(
INITCAP(input),
'(?i)(?<!\\d)([a-z])\\.(?!\\d)',
upper('$1') || '.'
);