Conversation
Notices
-
Hallå Kitteh (clacke@social.heldscal.la)'s status on Saturday, 11-Mar-2017 14:24:29 UTC Hallå Kitteh TIL about unicode normalizations NFKC and NFKD, which are like NFC and NFD except they also explode some typographical code points like the ffi ligature. That makes sense. But also modifications like superscript 5 becomes normal 5. That destroys information. I guess if you want to discover sneakiness and use it for comparison only, it's good.
http://unicode.org/reports/tr15/#Compatibility_Composite_Figure
Bonus link on the topic of what a "character" is: https://datamost.com/clacke/note/a3j57b_9TI2UaUBmmQeFug-
Hallå Kitteh (clacke@social.heldscal.la)'s status on Saturday, 11-Mar-2017 14:35:54 UTC Hallå Kitteh "Many developers believe that that a case-insensitive comparison is achieved by mapping both strings being compared to either upper- or lowercase and then comparing the resulting bytes."
Noo no no, it's never that easy. Introduction to case folding:
https://www.w3.org/International/wiki/Case_folding -
Hallå Kitteh (clacke@social.heldscal.la)'s status on Saturday, 11-Mar-2017 14:51:09 UTC Hallå Kitteh A story from the trenches:
https://labs.spotify.com/2013/06/18/creative-usernames/
#unicode #casefolding #normalization
-