ZmnSCPxj [ARCHIVE] on Nostr: 📅 Original date posted:2017-12-11 📝 Original message: Good morning Jonathan, ...
📅 Original date posted:2017-12-11
📝 Original message:
Good morning Jonathan,
>3. Descriptions say they can encode ASCII only. Sorry, but this is nonsense. Full unicode support via UTF8 should be supported.
I generally agree, but caution must be warned here. In particular, we should be precise, which variant of UTF8.
Presumably, a naive implementation, that specially treats 0 bytes (as would happen if the implementation were naively written in C or C++, where by default, strings are terminated by a 0 byte), should work correctly without having to particularly care, if the encoding is UTF8 or plain 7-bit ASCII. This then leads to the use of so-called Modified UTF8 as used by Java in its native interface: embedded null characters are encoded as extralong 3-byte UTF8 sequences, which are normally invalid in UTF8, but which naive treatment by C and C++ leads to (mostly) correct behavior. Should we use Modifed UTF8 or simply disallow null characters? (Use of ASCII does not avoid this, but ASCII has no alternative to null characters and the standard C string terminating byte 0).
In addition, pulling in UTF8 brings in the issue, of Unicode normalization. Multiple different byte-sequences in UTF8 may lead to the same sequence of human-readable glyphs. Specifying ASCII avoids this issue. Should we specify some Unicode normalization, and should GUI at least try to impose this Unicode normalization (even if backends/daemons simply ignore the description and hence any normalization issues)?
Regards,
ZmnSCPxj
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linuxfoundation.org/pipermail/lightning-dev/attachments/20171211/ee0b6923/attachment.html>
📝 Original message:
Good morning Jonathan,
>3. Descriptions say they can encode ASCII only. Sorry, but this is nonsense. Full unicode support via UTF8 should be supported.
I generally agree, but caution must be warned here. In particular, we should be precise, which variant of UTF8.
Presumably, a naive implementation, that specially treats 0 bytes (as would happen if the implementation were naively written in C or C++, where by default, strings are terminated by a 0 byte), should work correctly without having to particularly care, if the encoding is UTF8 or plain 7-bit ASCII. This then leads to the use of so-called Modified UTF8 as used by Java in its native interface: embedded null characters are encoded as extralong 3-byte UTF8 sequences, which are normally invalid in UTF8, but which naive treatment by C and C++ leads to (mostly) correct behavior. Should we use Modifed UTF8 or simply disallow null characters? (Use of ASCII does not avoid this, but ASCII has no alternative to null characters and the standard C string terminating byte 0).
In addition, pulling in UTF8 brings in the issue, of Unicode normalization. Multiple different byte-sequences in UTF8 may lead to the same sequence of human-readable glyphs. Specifying ASCII avoids this issue. Should we specify some Unicode normalization, and should GUI at least try to impose this Unicode normalization (even if backends/daemons simply ignore the description and hence any normalization issues)?
Regards,
ZmnSCPxj
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linuxfoundation.org/pipermail/lightning-dev/attachments/20171211/ee0b6923/attachment.html>