sfeed

Simple RSS and Atom feed parser
git clone https://git.sinitax.com/codemadness/sfeed
Log | Files | Refs | README | LICENSE | Upstream | sfeed.txt

commit eb6fe6f11a14afc82cd0039d88759d6c1c524d2f
parent 969ec64ef3195e00ae597e49a39e804bb6ce6464
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date:   Sun, 10 Apr 2016 19:51:18 +0200

improve documentation, add sfeed(5) for the file format

separate sfeed(5) page for just the feed file format.

Diffstat:
MMakefile | 1+
Msfeed.1 | 12++++++------
Asfeed.5 | 50++++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 57 insertions(+), 6 deletions(-)

diff --git a/Makefile b/Makefile @@ -35,6 +35,7 @@ LIB = ${LIBUTIL} ${LIBXML} MAN1 = ${BIN:=.1}\ ${SCRIPTS:=.1} MAN5 = \ + sfeed.5\ sfeedrc.5 DOC = \ CHANGELOG\ diff --git a/sfeed.1 b/sfeed.1 @@ -17,12 +17,12 @@ recommended to always have absolute urls in your feeds. .Sh TAB-SEPARATED FORMAT FIELDS The items are saved in a TSV-like format. .Pp -The fields: title, id, author are not allowed to have newlines and TABs. All -whitespace is replaced by a single space character. Control characters are -removed. +The fields: title, id, author are not allowed to have newlines and TABs, all +whitespace characters are replaced by a single space character. Control +characters are removed. .Pp The content field can contain newlines and is escaped. TABs, newlines and '\\' -are escaped with '\\', so: '\\n', '\\t', and '\\\\'. Other whitespace +are escaped with '\\', so it becomes: '\\t', '\\n' and '\\\\'. Other whitespace characters except space are removed. Control characters are removed. .Pp The order and format of the fields are: @@ -30,7 +30,7 @@ The order and format of the fields are: .It item timestamp UNIX timestamp in UTC+0, empty on parse failure. .It item title -Title text, HTML in titles is treated as plain-text (on purpose). +Title text, HTML in titles is treated as plain-text. .It item link Absolute url, unsafe characters are encoded. .It item content @@ -55,4 +55,4 @@ Item author. .Sh CAVEATS if a timezone is not supported (non-RFC-822) the UNIX timestamp is interpreted as UTC+0. -HTML in titles is treated as plain-text (on purpose). +HTML in titles is treated as plain-text. diff --git a/sfeed.5 b/sfeed.5 @@ -0,0 +1,50 @@ +.Dd April 10, 2016 +.Dt SFEED 5 +.Os +.Sh NAME +.Nm sfeed +.Nd feed format +.Sh SYNOPSIS +.Nm +.Sh DESCRIPTION +.Xr sfeed 1 +reads RSS or Atom feed data (XML) from stdin. It writes the feed data in a +TAB-separated format to stdout. +.Sh TAB-SEPARATED FORMAT FIELDS +The fields: title, id, author are not allowed to have newlines and TABs, all +whitespace characters are replaced by a single space character. Control +characters are removed. +.Pp +The content field can contain newlines and is escaped. TABs, newlines and '\\' +are escaped with '\\', so it becomes: '\\t', '\\n' and '\\\\'. Other whitespace +characters except space are removed. Control characters are removed. +.Pp +The order and format of the fields are: +.Bl -tag -width 17n +.It item timestamp +UNIX timestamp in UTC+0, empty on parse failure. +.It item title +Title text, HTML in titles is treated as plain-text. +.It item link +Absolute url, unsafe characters are encoded. +.It item content +Newlines and TABs are escaped. Control characters are removed. See the +.Sx TAB-SEPARATED FORMAT FIELDS +text. +.It item content\-type +"html" or "plain". +.It item id +RSS item GUID or Atom id. +.It item author +Item author. +.It feed type +"rss" or "atom". +.El +.Sh SEE ALSO +.Xr sfeed 1 +.Sh AUTHORS +.An Hiltjo Posthuma Aq Mt hiltjo@codemadness.org +.Sh CAVEATS +if a timezone is not supported (non-RFC-822) the UNIX timestamp is interpreted +as UTC+0. +HTML in titles is treated as plain-text.