Thursday, August 20, 2009

[pmbmcnvi] Merging wikipedia

The problem with forking Wikipedia, to get away from its questionable censorship practices, is one needs to merge back in new changes from the main Wikipedia back into the forked version. I used to think this was not possible, but here is a simple idea.

Additional facts can only be added between "large chunks" of Wikipedia text. "Large chunks" are large enough that they can be reidentified between Wikipedia revisions, and the additional facts reinserted between them.

A large chunks is probably a paragraph, though if your natural language processing is good enough to identify topics, paragraphs can be broken. Sentences are probably too fine a granularity to generally survive identification between edits.

In the event of a complete article rewrite in Wikipedia, all the additional facts get appended to the bottom and human assistance is requested to put the facts in the relevant places.

Additional facts can reference, annotate, or even patch text in Wikipedia by having a mechanism to refer to text. Of course, it is encouraged just to edit the original Wikipedia.

Lists of facts unorganized into prose goes against Wikipedia's style guidelines, but actually goes better for verifiability: each fact is presented in two columns. The second column is the (optional) source.

Mechanisms to keep the crap out of your forked Wikipedia is left up to you the forker.

With discretized facts, I'm imagining an ecosystem of cryptographic signatures and webs of trust, combined with two-column verifiability above.

No comments :