Saturday, February 10, 2007

Deletions in error-correcting codes

Deletions are a messy problem for error-correcting codes. One standard way of dealing with them is with synchronization strings. We can do better. The key is to use a short block size ECC as the first layer, short enough that we expect at most one deletion within it. Instead of scanning a sliding window for the synchronization string, we search for for a block which decodes mostly OK, possibly doing a side search for one deletion, as an erasure, or insertion within it. Each data block needs a its block number, perhaps modulo a number larger than the longest expected deletion. Another interleaved ECC layer corrects errors and erasures after deletions and insertions have been taken care of.

I wonder how DNA deals?

No comments :