Saturday, March 31, 2018

[myoypbbj] From hierarchical to a simple list of records

Given hierarchical data, e.g., html: <lorem>ipsum <dolor>sit</dolor> amet <a href="consectetur">adipiscing</a> elit </lorem>.  Expand it to a collection of records, for ease of processing with a record-oriented tool.

lorem _content ipsum <dolor>sit</dolor> amet <a href="consectetur">adipiscing</a> elit
lorem _contentdolor _content sit
lorem _contenta href consectetur
lorem _contenta _content adipiscing

If the records are printed 1 per line, then newlines in the content will need to be escaped.  Record separators, e.g., tab, in content will need to be escaped.

Other hierarchical data formats, e.g., JSON, probably don't need to do the _content pseudo tag.

Content could be an array, avoiding the repetition of content and tags broken down.

No comments :