sfeed

Simple RSS and Atom feed parser
git clone https://git.sinitax.com/codemadness/sfeed
Log | Files | Refs | README | LICENSE | Upstream | sfeed.txt

commit c28a8ba769f2c0468436e7a5d42264644711ff51
parent 1a1bd0e5a3a1a9dbdf20d7afec7c3246c2468e34
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date:   Wed,  5 Aug 2015 18:41:27 +0200

improve man-pages and documentation

Diffstat:
MREADME | 8+++++---
Msfeed.1 | 41++++++++++++++++++++++++++---------------
Msfeed_frames.1 | 47++++++++++++++++++++++++-----------------------
Msfeed_html.1 | 23+++++++++++++++++------
Msfeed_mbox.1 | 29+++++++++++++++++++++--------
Msfeed_opml_export.1 | 16++++++++++------
Msfeed_opml_import.1 | 2+-
Msfeed_plain.1 | 24+++++++++++++-----------
Msfeed_update.1 | 10+++++-----
Msfeed_web.1 | 18+++++++++++-------
Msfeed_xmlenc.1 | 2+-
11 files changed, 134 insertions(+), 86 deletions(-)

diff --git a/README b/README @@ -87,7 +87,7 @@ are escaped with '\', so: '\n', '\t', and '\\'. Other whitespace characters except space are removed. Control characters are removed. The timestamp field is converted to a UNIX timestamp. The timestamp is also -stored as formatted as a separate field. +added as a formatted text text field. The order and format of the fields are: @@ -101,8 +101,10 @@ item id - string item author - string feed type - string, "rss" or "atom". -CAVEAT: if a timezone is not supported (non-RFC-822) the UNIX timestamp is - interpreted as UTC+0. +CAVEATS: +- if a timezone is not supported (non-RFC-822) the UNIX timestamp is + interpreted as UTC+0. +- HTML in titles is not supported on purpose. Build and install diff --git a/sfeed.1 b/sfeed.1 @@ -10,36 +10,46 @@ .Sh DESCRIPTION .Nm reads RSS or Atom feed data (XML) from stdin. It writes the feed data in a -tab-separated format to stdout. A +TAB-separated format to stdout. A .Ar baseurl -can be specified if the links in the feed are relative urls and the baseurl of -the content differs from the feed. It is generally recommended to always have -absolute urls in your feeds, but the web sucks. +can be specified if the links in the feed are relative urls. It is +recommended to always have absolute urls in your feeds. .Sh TAB-SEPARATED FORMAT FIELDS -The items are saved in a TSV-like format except newlines, tabs and -backslash are escaped with \\ (\\n, \\t and \\\\). Carriage returns (\\r) are +The items are saved in a TSV-like format. +.Pp +The fields: title, id, author are not allowed to have newlines and TABs. All +whitespace is replaced by a single space character. Control characters are removed. .Pp +The content field can contain newlines and is escaped. TABs, newlines and '\\' +are escaped with '\\', so: '\\n', '\\t', and '\\\\'. Other whitespace +characters except space are removed. Control characters are removed. +.Pp +The timestamp field is converted to a UNIX timestamp. The timestamp is also +added as a formatted text field. +.Pp The order and format of the fields are: .Bl -tag -width 17n .It Ar item timestamp -string, UNIX timestamp in UTC+0 +UNIX timestamp in UTC+0. .It Ar item timestamp -string, date and time in the format: YYYY-mm-dd HH:MM:SS (UTC[+-][HHMM])|tz +Date and time in the format: YYYY-mm-dd HH:MM:SS (UTC[+-][HHMM])|tz. .It Ar item title -string +Title text, HTML in titles is treated as plain-text (on purpose). .It Ar item link -string, made to absolute url, unsafe characters are encoded +Absolute url, unsafe characters are encoded. .It Ar item content -string +Newlines and TABs are escaped. Control characters are removed. See the +.Sx TAB-SEPARATED FORMAT FIELDS +text. .It Ar item content\-type -string, "html" or "plain" +"html" or "plain". .It Ar item id -string +RSS item GUID or Atom id. .It Ar item author -string +Item author. .It Ar feed type -string, "rss" or "atom" +"rss" or "atom". .El .Sh SEE ALSO .Xr sfeed_plain 1 , @@ -50,3 +60,4 @@ string, "rss" or "atom" .Sh CAVEATS if a timezone is not supported (non-RFC-822) the UNIX timestamp is interpreted as UTC+0. +HTML in titles is treated as plain-text (on purpose). diff --git a/sfeed_frames.1 b/sfeed_frames.1 @@ -1,40 +1,41 @@ -.Dd December 25, 2014 +.Dd August 5, 2015 .Dt SFEED_FRAMES 1 .Os .Sh NAME .Nm sfeed_frames -.Nd formats a feeds file to HTML with frames +.Nd format feed data to HTML with frames .Sh SYNOPSIS .Nm -.Op Ar feed... +.Op Ar file... .Sh DESCRIPTION .Nm -formats a feeds file (TSV) from +formats feed data (TSV) from .Xr sfeed 1 -to HTML. It reads TSV data from stdin and writes HTML to the current -directory. For the exact TSV format see -.Xr sfeed 1 . -.Sh OPTIONS -.Bl -tag -width 14n -.It Ar directory path -Path to write files to, default is ".". On success the specified directory will -contain the files: -.El +to HTML. It reads TSV data from stdin or +.Ar file +and writes HTML files to the current directory. +If no +.Ar file +parameters are specified and so the data is read from stdin the feed name +is named "unnamed". +.Sh FILES WRITTEN .Bl -tag -width 13n -.It Ar index.html: -this is the main HTML file referencing to the frames (items.html and -menu.html). -.It Ar items.html: -this contains all the items as HTML links to the local content. -.It Ar menu.html: -menu frame which contains navigation "anchor" links to the feed names -(in items.html). +.It Ar index.html +The main HTML file referencing to the frames items.html and +menu.html. +.It Ar items.html +Contains all the items as HTML links to the local content. +.It Ar menu.html +Menu frame which contains navigation "anchor" links to the feed names +in items.html. .El .Sh FILE STRUCTURE -Directory for each feed category in the format: path/feedname/itemname.html. +Items for each feed category is in the format: feedname/itemname.html. The feedname and item names are normalized, whitespace characters are replaced with a \-, multiple whitespaces are replaced by a single \- and trailing -whitespace will be removed. +whitespace will be removed. The itemname is based on the title of the items. +The feedname and title is truncated to a maximum of 128 characters. The +maximum length of the path is PATH_MAX or filesystem-specific. .Sh SEE ALSO .Xr sfeed 1 , .Xr sfeed_plain 1 diff --git a/sfeed_html.1 b/sfeed_html.1 @@ -1,17 +1,28 @@ -.Dd December 25, 2014 +.Dd August 5, 2015 .Dt SFEED_HTML 1 .Os .Sh NAME .Nm sfeed_html -.Nd formats a feeds file to HTML +.Nd format feed data to HTML .Sh SYNOPSIS .Nm +.Op Ar file... .Sh DESCRIPTION .Nm -formats a feeds file (TSV) from -.Xr sfeed_update 1 -to HTML. It reads TSV data from stdin and writes HTML to stdout. For the exact TSV format see -.Xr sfeed_update 1 . +formats feed data (TSV) from +.Xr sfeed 1 +from stdin or +.Ar file +to stdout in HTML. +If one or more +.Ar file +are specified, the basename of the +.Ar file +is used as the feed name in the output. +If no +.Ar file +parameters are specified and so the data is read from stdin the feed name +is empty. .Sh SEE ALSO .Xr sfeed 1 , .Xr sfeed_plain 1 diff --git a/sfeed_mbox.1 b/sfeed_mbox.1 @@ -1,29 +1,42 @@ -.Dd May 17, 2015 +.Dd August 5, 2015 .Dt SFEED_MBOX 1 .Os .Sh NAME .Nm sfeed_mbox -.Nd formats a feeds file to mbox +.Nd format feed data to mboxrd .Sh SYNOPSIS .Nm +.Op Ar file... .Sh DESCRIPTION .Nm -formats a feeds file (TSV) from +formats feed data (TSV) from .Xr sfeed 1 -to mbox. It reads TSV data from stdin and writes mail in the mbox format -to stdout. These can be further processed by tools like +from stdin or +.Ar file +to stdout in the mboxrd format. +If one or more +.Ar file +are specified, the basename of the +.Ar file +is used as the feed name in the output. +If no +.Ar file +parameters are specified and so the data is read from stdin the feed name +is empty. +Lines starting with "From " will be mangled in the mboxrd-style. The mbox +data can be further processed by tools like .Xr procmail 1 or .Xr fdm 1 -for example. +for example. See the README file for some useful examples. .Sh FORMAT Depending on the original content\-type the mail will be formatted as plain-text (text/plain) or HTML (text/html). .Sh CUSTOM HEADERS -To make filtering simpler some custom headers are set: +To make further filtering simpler some custom headers are set: .Bl -tag -width Ds .It X-Feedname -The feedname (set in sfeedrc). +The feedname (as set in sfeedrc). .El .Sh SEE ALSO .Xr fdm 1 , diff --git a/sfeed_opml_export.1 b/sfeed_opml_export.1 @@ -3,17 +3,21 @@ .Os .Sh NAME .Nm sfeed_opml_export -.Nd generate an OPML file based on a sfeedrc config file +.Nd export feeds in a sfeedrc file to OPML data .Sh SYNOPSIS .Nm -.Op Ar config file +.Op Ar sfeedrc .Sh DESCRIPTION .Nm -parses the specified config file and output OPML XML data to stdout. +reads the specified +.Ar sfeedrc +config file and output OPML XML data to stdout. .Sh OPTIONS -.Bl -tag -width 17n -.It Op config file -default: "$HOME/.sfeed/sfeedrc", see the +.Bl -tag -width Ds +.It sfeedrc +Default: +.Pa $HOME/.sfeed/sfeedrc +see the .Xr sfeed_update 1 .Sx FILES READ section for more information. diff --git a/sfeed_opml_import.1 b/sfeed_opml_import.1 @@ -3,7 +3,7 @@ .Os .Sh NAME .Nm sfeed_opml_import -.Nd generate a sfeedrc config file based on an OPML file +.Nd generate a sfeedrc config file from an OPML file .Sh SYNOPSIS .Nm .Sh DESCRIPTION diff --git a/sfeed_plain.1 b/sfeed_plain.1 @@ -1,24 +1,26 @@ -.Dd December 25, 2014 +.Dd August 5, 2015 .Dt SFEED_PLAIN 1 .Os .Sh NAME .Nm sfeed_plain -.Nd formats a feeds file to a plain-text list +.Nd format feed data to a plain-text list .Sh SYNOPSIS .Nm .Op Ar file... .Sh DESCRIPTION .Nm -formats one or more -.Ar files -to a plain-text list. Each plain-text item will contain the feed name which -will be the filename. If no argument is given it will read a feed from stdin, -but there will not be a feed name. -The data read from -.Ar files -or stdin is in a TAB-separated-like format from +formats feed data (TSV) from .Xr sfeed 1 -For a more detailed list of this format and its fields see +from stdin or +.Ar file +to stdout as a plain-text list. If one or more +.Ar file +are specified, the basename of the +.Ar file +is used as the feed name in the output. If no +.Ar file +parameters are specified and so the data is read from stdin the feed name +is empty. .Xr sfeed 1 .Sh SEE ALSO .Xr sfeed 1 , diff --git a/sfeed_update.1 b/sfeed_update.1 @@ -1,4 +1,4 @@ -.Dd December 25, 2014 +.Dd August 5, 2015 .Dt SFEED_UPDATE 1 .Os .Sh NAME @@ -6,7 +6,7 @@ .Nd update feeds and merge with old feeds .Sh SYNOPSIS .Nm -.Op Ar configfile +.Op Ar sfeedrc .Sh DESCRIPTION .Nm updates feeds files and merges the new data with the previous files. These @@ -15,8 +15,8 @@ are the files in the directory by default. .Sh OPTIONS .Bl -tag -width 17n -.It Ar configfile -config file, if not specified uses the path +.It Ar sfeedrc +Config file, if not specified uses the path .Pa $HOME/.sfeed/sfeedrc by default. See the .Sx FILES READ @@ -40,7 +40,7 @@ speedup updating. .Sh FILES WRITTEN .Bl -tag -width 17n .It Ar feedname -Tab-separated format containing all items per feed. +TAB-separated format containing all items per feed. The sfeed_update script merges new items with this file. .It Ar feedname.new Temporary file used by sfeed_update to merge items. diff --git a/sfeed_web.1 b/sfeed_web.1 @@ -14,21 +14,25 @@ urls to stdout. .Sh OPTIONS .Bl -tag -width 8n .It Ar baseurl -optional base url to use for found feed urls that are relative. +Optional base url to use for found feed urls that are relative. .El .Sh OUTPUT FORMAT -content\-type<space>url<newline> -.Bl -tag -width 13n +url<TAB>content\-type<newline> +.Bl -tag -width Ds +.It Ar url +Found absolute url. If the url is relative and the +.Ar baseurl +option is +specified then the url is made absolute. If the url is relative and no +.Ar baseurl +option is specified it is empty. .It Ar content\-type Usually application/atom+xml or application/rss+xml. -.It Ar url -Found url to the feed. If the url is relative and the baseurl option is -specified then the url is changed accordingly. .El .Sh EXAMPLES Get urls from xkcd website: .Bd -literal -wget http://www.xkcd.com -q -O - | sfeed_web "http://www.xkcd.com/" +curl -s -L http://www.xkcd.com | sfeed_web "http://www.xkcd.com/" .Ed .Sh SEE ALSO .Xr sfeed_update 1 , diff --git a/sfeed_xmlenc.1 b/sfeed_xmlenc.1 @@ -12,7 +12,7 @@ reads XML data from stdin and writes the found text\-encoding to stdout. .Sh EXAMPLES Get text\-encoding from xkcd Atom feed: .Bd -literal -wget http://www.xkcd.com/atom.xml -q -O - | sfeed_xmlenc +curl -s -L http://www.xkcd.com/atom.xml | sfeed_xmlenc .Ed .Sh SEE ALSO .Xr sfeed_update 1 ,