Ranter
Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Comments
-
lorentz1529922hAbsolutely not. No pre-trimming, no case-insensitive storage, no length limits, no grapheme normalization. Text is hard and we suck at it. By default they're an optimized sequence of 32bit codepoints that you can split and compare, where inequality guarantees that two strings can't have been created the same way (NOT that they represent different text), and less-than is an arbitrary total ordering for algorithms (NOT alphabetic order).
For all other purposes you should use a Unicode grapheme library with a specific locale. Many programming languages also choose to provide broken operations that only work on English and partially on some latin text, because they're made by Americans with deadlines. -
lorentz1529921htoo late to edit but even I was too permissive up there, actually splitting strings by codepoints is incorrect. For transmission and storage count bytes, for display count graphemes. I meant to talk about a "contains" check, but since equality is meaningless, so is "contains". So instead I'll point out that you can split graphemes and normalize strings without specifying a locale, and on this level both "starts with" and normalized equality are meaningful operations.
-
@jestdotty you don't have to announce yourself, you know? you're not that significant.
petition to trim all strings by default
rant