Friday, August 31, 2018

[zvyswyrq] Fun with accents

Cès̋ hll m̀a̋k lw̏ r̋p̏e̊c̋tın̑g̀ ån̄ s̃tbhṅt öf liḡi, r̃ p̏r̈hıbıtiğ th fèé ĕx́ŕc̄iśë théf; õ ăbr̊idg̈ıthe̋ fr̊èdȏ f ȅćh, f thẽ p̊r̊; thig̈ht f thĕ p̈ēŏp̋l p̈e̋āc̄èảblỹ to̊ äbl, d tő p̈tıtiỏń thė Gŕn̉t f ä ŕde̋s̋ṡ f g̈r̋is̃.

Unicode combining diacritics can put an accent on any character.  We limit to characters without ascenders.  i and j are treated specially as explained below.  We also experimented with diacritical marks below (cedilla, ogonek), but it caused formatting problems with some fonts when there was both a mark above and below: the mark below was (probably correctly) off center, but it caused the mark above to incorrectly go off center also.  Therefore, we did underline via HTML, and only below letters without descenders.

We experimented with putting accents on dotless i and j, but formatting got messed up with some fonts: the accents were wider than the character so overlapped with accents on neighboring characters.  (This is only a problem in proportional fonts, not monospace.)  Therefore the only "accent" possible on i and j are the removal of the dot.  Possible problem: if a font has a "f i" ligature, it might be difficult to distinguish the ligature versus f followed by a dotless i: dotted= fi fj, dotless= fı fȷ

The general idea is that ascenders and descenders in some letters force space to be reserved for them for all letters, so fill empty space with an accent or underline to increase information density.  (Of course, it will become harder to read, as increased density often does.)  Encoding information as a pattern of accents on some "carrier" text is mixed-radix base conversion.

There was a choice between the curved breve/inverted breve pair or the angled circumflex/caron pair: probably don't use both because they look too similar.  We chose curvy because there were lots of angles already with the single and double acute and grave.  Tilde and macron look somewhat similar, but we used both.  We chose [ 0x300, 0x301, 0x306, 0x311, 0x303, 0x304, 0x30b, 0x30f, 0x308, 0x307, 0x30a, 0x309 ].

Although not relevant to the above example, some fonts have descenders for capital Q and J so should not be underlined.  QJQJOIOI.  Same using CSS instead of the HTML U tag (some browsers do CSS differently than <u>): QJQJOIOI.

Haskell source code to add random accents, and to enumerate the range of possible characters subject to the constraints we've chosen.  Editorially: Unicode is supposed to capture all the world's languages that actually exist.  It isn't supposed to be used to invent new characters not used in any real language, a capability we are abusing here.

aabbccddeeffghhiıiıjȷkkllmmnnoopqrrssttuuvvwwxxyzzAABBCCDDEEFFGGHHIIJKKLLMMNNOOPPQRRSSTTUUVVWWXXYYZZ

Total number of possibilities: 460.

More capital letters possible.

No comments :