sfeed

Simple RSS and Atom feed parser
git clone https://git.sinitax.com/codemadness/sfeed
Log | Files | Refs | README | LICENSE | Upstream | sfeed.txt

commit e46d200e0cb2ffb79a7d542f65809e1bb14c445c
parent 69459b1ef6af55ea1c6e83947e939baacb3e93c8
Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date:   Fri, 25 Jan 2019 13:50:43 +0100

documentation improvements

Man pages:
- sfeed_update: fix: fetchfeed parameter documentation.
- sfeed_update: fix/update: urls in sfeedrc.example.
- sfeed_update: document maxjobs variable.
- sfeedrc: document filter and order functions here.
- more semantic keywords: function arguments and some Nm.

README:
- Document more clearly sfeedrc is a shellscript at the first usage "steps".
- Add newsboat OPML export and import to sfeed_update example.
- Document the Makefile is POSIX (not some GNU/Makefile).
- Add reference to my tool hurl: a HTTP/HTTPS/Gopher file grab client.
- Describe the reason/usefulness of the filter example.
- Describe how to override curl(1), an optional dependency.

With feedback from lich, thanks!

Diffstat:
MREADME | 44++++++++++++++++++++++++++++++++++++++------
Msfeed_update.1 | 30+++++++++++++++++++++---------
Msfeedrc.5 | 70+++++++++++++++++++++++++++++++++++++++++++---------------------------
3 files changed, 102 insertions(+), 42 deletions(-)

diff --git a/README b/README @@ -19,7 +19,9 @@ Initial setup: mkdir -p "$HOME/.sfeed/feeds" cp sfeedrc.example "$HOME/.sfeed/sfeedrc" -Edit the configuration file and change any RSS/Atom feeds: +Edit the sfeedrc(5) configuration file and change any RSS/Atom feeds. This file +is included and evaluated as a shellscript for sfeed_update, so it's functions +and behaviour can be overridden: $EDITOR "$HOME/.sfeed/sfeedrc" @@ -27,6 +29,11 @@ or you can import existing OPML subscriptions using sfeed_opml_import(1): sfeed_opml_import < file.opml > "$HOME/sfeed/sfeedrc" +an example to export from an other RSS/Atom reader called newsboat and import +for sfeed_update: + + newsboat -e | sfeed_opml_import > "$HOME/.sfeed/sfeedrc" + Update feeds, this script merges the new items: sfeed_update @@ -71,12 +78,12 @@ Dependencies Optional dependencies --------------------- -- make(1) (for Makefile). +- POSIX make(1) (for Makefile). - POSIX sh(1), used by sfeed_update(1) and sfeed_opml_export(1). - curl(1) binary: http://curl.haxx.se/ , used by sfeed_update(1), can be replaced with any tool like wget(1), - OpenBSD ftp(1). + OpenBSD ftp(1) or hurl(1): https://git.codemadness.org/hurl/ - iconv(1) command-line utilities, used by sfeed_update(1). If the text in your RSS/Atom feeds are already UTF-8 encoded then you don't need this. For an alternative minimal iconv @@ -136,9 +143,10 @@ Atleast the following functions can be overridden per feed: - filter: to filter on fields. - order: to change the sort order. -The function feeds() is called to fetch the feeds. The function feed() can -safely be executed concurrently as a background job in your sfeedrc(5) config -file to make updating faster. +The function feeds() is called to process the feeds. The default feed() +function is executed concurrently as a background job in your sfeedrc(5) config +file to make updating faster. The variable maxjobs can be changed to limit or +increase the amount of concurrent jobs (8 by default). Files written at runtime by sfeed_update(1) @@ -218,6 +226,10 @@ argument is optional): - - - +The filter function can be overridden in your sfeedrc file. This allows +filtering items per feed. It can be used to shorten urls, filter away +advertisements, strip tracking parameters and more. + # filter fields. # filter(name) filter() { @@ -266,6 +278,22 @@ filter() { - - - +The fetchfeed function can be overridden in your sfeedrc file. This allows to +replace the default curl(1) for sfeed_update with any other client to fetch the +RSS/Atom data: + +# fetch a feed via HTTP/HTTPS etc. +# fetchfeed(name, url, feedfile) +fetchfeed() { + if hurl -m 1048576 -t 15 "$2" 2>/dev/null; then + printf " OK %s %s\n" "$(date +'%H:%M:%S')" "$1" >&2 + else + printf "FAIL %s %s\n" "$(date +'%H:%M:%S')" "$1" >&2 + fi +} + +- - - + Over time your feeds file might become quite big. You can archive items from a specific date by doing for example: @@ -325,6 +353,10 @@ Now compile and run: $ mv feeds feeds.bak $ mv feeds.new feeds +This could also be run weekly in a crontab to archive the feeds. Like throwing +away old newspapers. It keeps the feeds list tidy and the formatted output +small. + - - - Convert mbox to separate maildirs per feed and filter duplicate messages using diff --git a/sfeed_update.1 b/sfeed_update.1 @@ -1,4 +1,4 @@ -.Dd September 28, 2018 +.Dd January 25, 2019 .Dt SFEED_UPDATE 1 .Os .Sh NAME @@ -32,33 +32,45 @@ This file is evaluated as a shellscript in .Pp Atleast the following functions can be overridden per feed: .Bl -tag -width 17n -.It fetchfeed +.It Fn fetchfeed to use .Xr wget 1 , OpenBSD .Xr ftp 1 or an other download program. -.It merge +.It Fn merge to change the merge logic. -.It filter +.It Fn filter to filter on fields. -.It order +.It Fn order to change the sort order. .El .Pp -The function feeds() is called to fetch the feeds. -The function feed() can safely be executed concurrently as a background job in +The function +.Fn feeds +is called to process the feeds. +The default +.Fn feed +is executed concurrently as a background job in your .Xr sfeedrc 5 config file to make updating faster. +The variable +.Va maxjobs +can be changed to limit or increase the amount of concurrent jobs (8 by +default). .El .Sh FILES WRITTEN .Bl -tag -width 17n .It feedname TAB-separated format containing all items per feed. -The sfeed_update script merges new items with this file. +The +.Nm +script merges new items with this file. .It feedname.new -Temporary file used by sfeed_update to merge items. +Temporary file used by +.Nm +to merge items. .El .Sh EXAMPLES To update your feeds and format them in various formats: diff --git a/sfeedrc.5 b/sfeedrc.5 @@ -1,4 +1,4 @@ -.Dd January 30, 2016 +.Dd January 25, 2019 .Dt SFEEDRC 5 .Os .Sh NAME @@ -17,27 +17,27 @@ by default this is . .Sh FUNCTIONS The following functions must be defined in a -.Xr sfeedrc 5 +.Nm file: .Bl -tag -width Ds -.It feeds +.It Fn feeds This function is like a "main" function called from .Xr sfeed_update 1 . -.It feed +.It Fn feed "name" "feedurl" "basesiteurl" "encoding" Function to process the feed, its arguments are in the order: .Bl -tag -width Ds -.It name +.It Fa name Name of the feed, this is also used as the filename for the TAB-separated feed file. -.It feedurl +.It Fa feedurl Uri to fetch the RSS/Atom data from, usually a HTTP or HTTPS uri. -.It Op basesiteurl +.It Op Fa basesiteurl Baseurl of the feed links. This argument allows to fix relative item links. .Pp According to the RSS and Atom specification feeds should always have absolute urls, however this is not always the case in practise. -.It Op encoding +.It Op Fa encoding Feeds are decoded from this name to UTF-8, the name should be a usable character-set for the .Xr iconv 1 @@ -50,38 +50,52 @@ Because is a shellscript each function can be overridden to change its behaviour, notable functions are: .Bl -tag -width Ds -.It fetchfeed +.It Fn fetchfeed "name" "uri" "feedfile" Fetch feed from url and writes data to stdout, its arguments are: .Bl -tag -width Ds -.It uri -Uri to fetch. -.It name +.It Fa name Specified name in configuration file (useful for logging). -.It feedfile +.It Fa uri +Uri to fetch. +.It Fa feedfile Used feedfile (useful for comparing modification times). .El -.It merge +.It Fn merge "name" "oldfile" "newfile" Merge data of oldfile with newfile and writes it to stdout, its arguments are: .Bl -tag -width Ds -.It oldfile +.It Fa name +Feed name. +.It Fa oldfile Old file. -.It newfile +.It Fa newfile New file. .El -.It convertencoding +.It Fn filter "name" +Filter +.Xr sfeed 5 +data from stdin, write to stdout, its arguments are: +.Bl -tag -width Ds +.It Fa name +Feed name. +.El +.It Fn order "name" +Sort +.Xr sfeed 5 +data from stdin, write to stdout, its arguments are: +.Bl -tag -width Ds +.It Fa name +Feed name. +.El +.It Fn convertencoding "from" "to" Convert from text-encoding to another and writes it to stdout, its arguments are: .Bl -tag -width Ds -.It from +.It Fa from From text-encoding. -.It to +.It Fa to To text-encoding. .El .El -.Pp -See the convertencoding() function in the script -.Xr sfeed_update 1 -for more details. .Sh EXAMPLES An example configuration file is included named sfeedrc.example and also shown below: @@ -91,14 +105,16 @@ shown below: # list of feeds to fetch: feeds() { # feed <name> <feedurl> [basesiteurl] [encoding] - feed "codemadness" "http://www.codemadness.nl/rss.xml" + feed "codemadness" "https://www.codemadness.nl/atom.xml" feed "explosm" "http://feeds.feedburner.com/Explosm" - feed "linux kernel" "http://kernel.org/kdist/rss.xml" "http://kernel.org" + feed "golang github releases" "https://github.com/golang/go/releases.atom" + feed "linux kernel" "https://www.kernel.org/feeds/kdist.xml" "https://www.kernel.org" + feed "reddit openbsd" "https://old.reddit.com/r/openbsd/.rss" feed "slashdot" "http://rss.slashdot.org/Slashdot/slashdot" "http://slashdot.org" feed "tweakers" "http://feeds.feedburner.com/tweakers/mixed" "http://tweakers.net" "iso-8859-1" # get youtube Atom feed: curl -s -L 'https://www.youtube.com/user/gocoding/videos' | sfeed_web | cut -f 1 - feed "yt golang" "https://www.youtube.com/feeds/videos.xml?channel_id=UCO3LEtymiLrgvpb59cNsb8A" - feed "xkcd" "http://xkcd.com/atom.xml" "http://xkcd.com" + feed "youtube golang" "https://www.youtube.com/feeds/videos.xml?channel_id=UCO3LEtymiLrgvpb59cNsb8A" + feed "xkcd" "https://xkcd.com/atom.xml" "https://xkcd.com" } .Ed .Sh SEE ALSO