Thursday, November 28, 2013

[ymleuvrt] Fuzzy location identifiers within HTML

Consider a node within an XML document identified by something like 1html.2body.2div.3ol.2li.2img.  The numbers indicate which child of a parent node to follow.  The tag names are redundant information indicating the type of node.

Assume the document then gets edited.  Find the node "most likely" to be the corresponding node in the original document.  Probably something like tree edit distance.  The idea is that, even after significant edits, there may be only one "IMG inside an OL".

No comments :