@cdxiao @natecull Late supplementary: Here's everything you ever wanted to know about morpheme clusters, normalizations, case folding, deconstructing normalization (each level of linking below adds to the puzzle) ...
Hallå Kitteh (clacke@social.heldscal.la)'s status on Saturday, 11-Mar-2017 14:24:29 UTC
Hallå KittehTIL about unicode normalizations NFKC and NFKD, which are like NFC and NFD except they also explode some typographical code points like the ffi ligature. That makes sense. But also modifications like superscript 5 becomes normal 5. That destroys information. I guess if you want to discover sneakiness and use it for comparison only, it's good.