Friday, February 21, 2014

[azjtcizp] Latin-1

ISO 8895-1 contains roughly 16*12=192 printable characters.  However, the following characters don't render so well: del (127), space (32), nbsp (160), and soft hyphen (173).  This leaves 188 characters that have a good chance of being supported by some font on a user's system.  Base 188 is therefore another possibility for dense encoding of data.

Also consider omitting grave accent (96), umlaut (168), overline (175), acute accent (180), and cedilla (184), because they look weird not attached to a character (leaving 183 characters).

Also consider omitting hyphen (45), to permit the liberal use of soft hyphens, to allow a text justifier to break lines anywhere (leaving 182 characters). Or, use the <wbr> HTML element.

Also consider omitting the following 25 characters which are already tall (they happen to be capital), and an accent over them may interfere with line spacing: [192, 193, 194, 195, 196, 197, 200, 201, 202, 203, 204, 205, 206, 207, 209, 210, 211, 212, 213, 214, 217, 218, 219, 220, 221]. Also consider omitting [166, 180, 190] from ISO 8895-15.

For in-band escapes, consider using the C0 and C1 control characters like they were originally meant, e.g. DCS 144 or SOS 152 terminated by ST 156.

No comments :