Sunday, April 17, 2016

[alazzrkk] Encoding data in Korean

One can convert a number to base 11172 then express it using the Precomposed Hangul syllables Unicode block (starting at code point 44032 decimal).

The large block itself is actually form of mixed radix notation, with the multipliers being, from big-endian to units, 19, 21, and 28.  This is useful for visually looking up the code point of a character.  There remains the problem that some jamo are visually very similar, so vulnerable to noise.

#!perl -nw
# GPL 3
use utf8;
use bigint;
BEGIN{
binmode STDOUT,":encoding(UTF-8)";
}
# assume input in the format of md5sum
die unless ($h,$fn)=/^([0-9A-Fa-f]+)( .*)?/;
$i=hex($h);
$m=19*21*28;
undef@A;
while($i){
$q=$i % $m;
unshift @A,$q;
$i-=$q;
$i/=$m;
}
for(@A){
print chr(44032+$_);
#print $_,"\n";
}
print $fn if defined($fn);
print "\n"

No comments :