zlacker

[parent] [thread] 2 comments
1. ronces+(OP)[view] [source] 2025-08-22 12:31:52
I've always thought the point of the string type was for indexing. One index of a string is always one character, but characters are sometimes composed of multiple bytes.
replies(2): >>birn55+P4 >>crazyg+t7
2. birn55+P4[view] [source] 2025-08-22 12:55:53
>>ronces+(OP)
You can't do that in a performant way and going that route can lead to problems, because characters (= graphemes in the language of Unicode) generally don't always behave as developers assume.
3. crazyg+t7[view] [source] 2025-08-22 13:08:59
>>ronces+(OP)
Yup. But to be clear, in Unicode a string will index code points, not characters. E.g. a single emoji can be made of multiple code points, as well as certain characters in certain languages. The Unicode name for a character like this is a "grapheme", and grapheme splitting is so complicated it generally belongs in a dedicated Unicode library, not a general-purpose string object.
[go to top]