Nicolas Martyanoff on Nostr: Just finished support for Unicode general category lookups in #CommonLisp. While ...
Just finished support for Unicode general category lookups in #CommonLisp. While CL-UNICODE generates a binary tree, I used a vector of 64 bit integers instead, each one representing a character block (start and end code point) and its category as an integer (used later with a lookup table).
Then I can do a binary search to find the category of a code point. Much faster than a tree and uses a lot less memory.
Bonus, I have a second table for immediate lookups of ASCII characters, making the common case the fastest.
That was fun!
Then I can do a binary search to find the category of a code point. Much faster than a tree and uses a lot less memory.
Bonus, I have a second table for immediate lookups of ASCII characters, making the common case the fastest.
That was fun!