Saturday, July 03, 2004

Grammar checking by markov chaining

Start with a reasonably good Markov model of English, say, 1st-order. Also append to it the bigram counts of the text to be grammar checked. This eliminates zero probabilities of non-existent bigrams in the training set. Finally, highlight the bigrams in the text which are low probability, perhaps ranked by probability. Mumble mumble some sort of normalization, conditioning, etc., mumble mumble. The most frequent grammar errors I make are those caused by cut-and-paste, or losing my train of thought while typing. These result in "easy" grammar errors: sentences that make no sense at all. Consequently, these errors can be caught by a relatively simple model of the language.

No comments :