Wednesday, May 20, 2009

[ajfhfrqz] Nonparametric zeitgeist

Take the OLD corpus and rank the words by frequency, omitting words that occurred less than some threshold. Do the same with the NEW corpus. Which words have changed in ranking the most? (Words not present in the OLD corpus are assigned rank N+1 where N is the last rank in the OLD corpus.)

Update: this doesn't seem to work so well. We need Fisher or Barnard's tests.

No comments :