Thursday, January 21, 2016

[adhnhgri] Multiple forms of plain text

There are several conventions about paragraph separators and line breaks for plain text:

  1. Paragraphs are separated by 2 newlines, and line breaks within a paragraph, for example, a list masquerading as a paragraph, are denoted by a single newline.
  2. Paragraphs are separated by 2 newlines, and paragraphs are justified with newlines to some right margin (possibly ragged right).  The location of the right margin might have to be deduced, or is specified in metadata associated with the text.  Line breaks within a paragraph must be deduced by line breaks happening before the right margin.  I dislike this (common) convention, because it is difficult for a tool to detect when there has been an error in right margin justification -- it could always have been intentional.  This is the approach taken by internet RFCs, and also (vaguely) by coding conventions requiring source code not to exceed a certain width (though source code contains a lot of additional structure -- analogous to metadata).
  3. Paragraphs are separated by 2 newlines, and single newlines within a paragraph are simply whitespace with no special meaning.  Line breaks within a paragraph require additional markup, perhaps signifying list items.  This is the approach taken by TeX, LaTeX, and Markdown.
  4. Paragraphs are separated by 1 newline, and line breaks within a paragraph require additional markup or are not possible.  This is often seen when a word processing document is exported as plain text.

Create standardized metadata which can be used to describe which formatting convention for paragraphs is being used.

The motivation is to be able to reflow a plain text document to different margin widths.

No comments :