cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Spark SQL INITCAP not capitalizing letters after periods in abbreviations

dkhodyriev1208
Visitor

Using SELECT INITCAP("text (e.g., text, text, etc.)"abbreviations with periods like e.g. are not being fully capitalized.

Current behavior:

Input:  "text (e.g., text, text, etc.)"Output: "Text (e.g., Text, Text, Etc.)"

Expected behavior:

Output: "Text (E.G., Text, Text, Etc.)"

Version: 

16.4.x-scala2.12
3 REPLIES 3

Coffee77
Contributor III

Try something like this:

Coffee77_0-1764066460243.png

 


Lifelong Learner Cloud & Data Solution Architect | https://www.youtube.com/@CafeConData

Thanks for suggestion.

Good option, but few concerns here: there is no space in my original example for e.g. so it would require more sophisticated regex or custom udf.

I think that the root cause of the issue is some limitation of initcap function, that treats only spaces as delimiter. Also, I've tried such query in EKS cluster, and it works as expected, so this is rather some limitation of Databricks Spark version

 

Coffee77
Contributor III

My solution is indeed a workaround. INITCAP is behaving as you comment. You can include another regular expression at the beginning to remove non-original "spaces" but I agree that makes it a little complex. However, no other solution so far I'm aware of


Lifelong Learner Cloud & Data Solution Architect | https://www.youtube.com/@CafeConData

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now