sfeed

Simple RSS and Atom feed parser
git clone https://git.sinitax.com/codemadness/sfeed
Log | Files | Refs | README | LICENSE | Upstream | sfeed.txt

commit 6b9a891452a00c176022a995334a33696d85303a
parent e96f24af8bb6e97156a891de90026c340596ba5e
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date:   Sat, 21 May 2016 14:09:54 +0200

improve wording in documentation

link to sfeed(5) in README to avoid having to duplicate documentation
text.

Diffstat:
MREADME | 35+++++------------------------------
Msfeed.1 | 30++++++++++++++----------------
Msfeed.5 | 32+++++++++++++++++---------------
Msfeed_frames.1 | 2+-
4 files changed, 37 insertions(+), 62 deletions(-)

diff --git a/README b/README @@ -87,7 +87,7 @@ Platforms tested - Linux (glibc+gcc, musl-gcc, clang). - NetBSD -- OpenBSD +- OpenBSD: (gcc, pcc). - Windows (cygwin gcc, mingw). @@ -132,36 +132,11 @@ feedname - TAB-separated format containing all items per feed. The feedname.new - Temporary file used by sfeed_update(1) to merge items. -TAB-separated format fields ---------------------------- +File format +----------- -The items are saved in a TSV-like format. - -The fields: title, id, author are not allowed to have newlines and TABs, all -whitespace characters are replaced by a space character. Control characters are -removed. - -The content field can contain newlines and TABS and are escaped. TABs, newlines -and '\' are escaped with '\', so it becomes: '\t', '\n' and '\\'. Other -whitespace characters except space are removed. Control characters are removed. - -The order and format of the fields are: - -item UNIX timestamp - UNIX timestamp (UTC+0), empty on parse failure. -item title - Title text, HTML in titles is treated as - plain-text. -item link - Absolute url, unsafe characters are encoded. -item content - Newlines and TABs are escaped. Control characters - are removed. See the "TAB-separated format fields" - text. -item contenttype - "html" or "plain". -item id - RSS item GUID or Atom id. -item author - Item author. - -CAVEATS: -- if a timezone is not supported (non-RFC-822) the UNIX timestamp is - interpreted as UTC+0. -- HTML in titles is not supported on purpose. +man 5 sfeed +man 1 sfeed Usage and examples diff --git a/sfeed.1 b/sfeed.1 @@ -25,32 +25,30 @@ The content field can contain newlines and is escaped. TABs, newlines and '\\' are escaped with '\\', so it becomes: '\\t', '\\n' and '\\\\'. Other whitespace characters except space are removed. Control characters are removed. .Pp -The order and format of the fields are: +The order and content of the fields are: .Bl -tag -width 17n -.It item timestamp +.It timestamp UNIX timestamp in UTC+0, empty on parse failure. -.It item title -Title text, HTML in titles is treated as plain-text. -.It item link +.It title +Title text, HTML code in titles is ignored and is treated as plain-text. +.It link Absolute url, unsafe characters are encoded. -.It item content -Newlines and TABs are escaped. Control characters are removed. See the -.Sx TAB-SEPARATED FORMAT FIELDS -text. -.It item content\-type +.It content +Content, can have plain-text or HTML code depending on the content\-type field. +.It content\-type "html" or "plain". -.It item id +.It id RSS item GUID or Atom id. -.It item author +.It author Item author. .El .Sh SEE ALSO .Xr sfeed_plain 1 , -.Xr sfeed_update 1 , -.Xr sh 1 +.Xr sfeed 5 .Sh AUTHORS .An Hiltjo Posthuma Aq Mt hiltjo@codemadness.org .Sh CAVEATS -if a timezone is not supported (non-RFC-822) the UNIX timestamp is interpreted -as UTC+0. +If a timezone is not in the RFC-822 or RFC-3332 format it is not supported and +the UNIX timestamp is interpreted as UTC+0. +.Pp HTML in titles is treated as plain-text. diff --git a/sfeed.5 b/sfeed.5 @@ -11,6 +11,8 @@ reads RSS or Atom feed data (XML) from stdin. It writes the feed data in a TAB-separated format to stdout. .Sh TAB-SEPARATED FORMAT FIELDS +The items are saved in a TSV-like format. +.Pp The fields: title, id, author are not allowed to have newlines and TABs, all whitespace characters are replaced by a single space character. Control characters are removed. @@ -19,30 +21,30 @@ The content field can contain newlines and is escaped. TABs, newlines and '\\' are escaped with '\\', so it becomes: '\\t', '\\n' and '\\\\'. Other whitespace characters except space are removed. Control characters are removed. .Pp -The order and format of the fields are: +The order and content of the fields are: .Bl -tag -width 17n -.It item timestamp +.It timestamp UNIX timestamp in UTC+0, empty on parse failure. -.It item title -Title text, HTML in titles is treated as plain-text. -.It item link +.It title +Title text, HTML code in titles is ignored and is treated as plain-text. +.It link Absolute url, unsafe characters are encoded. -.It item content -Newlines and TABs are escaped. Control characters are removed. See the -.Sx TAB-SEPARATED FORMAT FIELDS -text. -.It item content\-type +.It content +Content, can have plain-text or HTML code depending on the content\-type field. +.It content\-type "html" or "plain". -.It item id +.It id RSS item GUID or Atom id. -.It item author +.It author Item author. .El .Sh SEE ALSO -.Xr sfeed 1 +.Xr sfeed 1 , +.Xr sfeed_plain 1 .Sh AUTHORS .An Hiltjo Posthuma Aq Mt hiltjo@codemadness.org .Sh CAVEATS -if a timezone is not supported (non-RFC-822) the UNIX timestamp is interpreted -as UTC+0. +If a timezone is not in the RFC-822 or RFC-3332 format it is not supported and +the UNIX timestamp is interpreted as UTC+0. +.Pp HTML in titles is treated as plain-text. diff --git a/sfeed_frames.1 b/sfeed_frames.1 @@ -42,7 +42,7 @@ The maximum length of the path is PATH_MAX or filesystem-specific (truncated). .Sh AUTHORS .An Hiltjo Posthuma Aq Mt hiltjo@codemadness.org .Sh SECURITY CONSIDERATIONS -Each item file contain the item content formatted as HTML, if the feed data +Each item content file contains the content formatted as HTML, if the feed data contains HTML like Javascripts, tracking cookies, custom styles and such these will also be displayed. Due to the crazy nature of "the web" these things are complex to filter. Some security and privacy can be gained by using an