Given hierarchical data, e.g., html: <lorem>ipsum <dolor>sit</dolor> amet <a href="consectetur">adipiscing</a> elit </lorem>. Expand it to a collection of records, for ease of processing with a record-oriented tool.
lorem | _content | ipsum <dolor>sit</dolor> amet <a href="consectetur">adipiscing</a> elit | ||
lorem | _content | dolor | _content | sit |
lorem | _content | a | href | consectetur |
lorem | _content | a | _content | adipiscing |
If the records are printed 1 per line, then newlines in the content will need to be escaped. Record separators, e.g., tab, in content will need to be escaped.
Other hierarchical data formats, e.g., JSON, probably don't need to do the _content pseudo tag.
Content could be an array, avoiding the repetition of content and tags broken down.
No comments :
Post a Comment