README - sfeed - Simple RSS and Atom feed parser

	sfeed Simple RSS and Atom feed parser
	git clone https://git.sinitax.com/codemadness/sfeed
	Log \| Files \| Refs \| README \| LICENSE \| Upstream \| sfeed.txt
README (34641B)
      1sfeed
      2-----
      3
      4RSS and Atom parser (and some format programs).
      5
      6It converts RSS or Atom feeds from XML to a TAB-separated file. There are
      7formatting programs included to convert this TAB-separated format to various
      8other formats. There are also some programs and scripts included to import and
      9export OPML and to fetch, filter, merge and order feed items.
     10
     11
     12Build and install
     13-----------------
     14
     15$ make
     16# make install
     17
     18
     19To build sfeed without sfeed_curses set SFEED_CURSES to an empty string:
     20
     21$ make SFEED_CURSES=""
     22# make SFEED_CURSES="" install
     23
     24
     25To change the theme for sfeed_curses you can set SFEED_THEME.  See the themes/
     26directory for the theme names.
     27
     28$ make SFEED_THEME="templeos"
     29# make SFEED_THEME="templeos" install
     30
     31
     32Usage
     33-----
     34
     35Initial setup:
     36
     37	mkdir -p "$HOME/.sfeed/feeds"
     38	cp sfeedrc.example "$HOME/.sfeed/sfeedrc"
     39
     40Edit the sfeedrc(5) configuration file and change any RSS/Atom feeds. This file
     41is included and evaluated as a shellscript for sfeed_update, so its functions
     42and behaviour can be overridden:
     43
     44	$EDITOR "$HOME/.sfeed/sfeedrc"
     45
     46or you can import existing OPML subscriptions using sfeed_opml_import(1):
     47
     48	sfeed_opml_import < file.opml > "$HOME/.sfeed/sfeedrc"
     49
     50an example to export from an other RSS/Atom reader called newsboat and import
     51for sfeed_update:
     52
     53	newsboat -e | sfeed_opml_import > "$HOME/.sfeed/sfeedrc"
     54
     55an example to export from an other RSS/Atom reader called rss2email (3.x+) and
     56import for sfeed_update:
     57
     58	r2e opmlexport | sfeed_opml_import > "$HOME/.sfeed/sfeedrc"
     59
     60Update feeds, this script merges the new items, see sfeed_update(1) for more
     61information what it can do:
     62
     63	sfeed_update
     64
     65Format feeds:
     66
     67Plain-text list:
     68
     69	sfeed_plain $HOME/.sfeed/feeds/* > "$HOME/.sfeed/feeds.txt"
     70
     71HTML view (no frames), copy style.css for a default style:
     72
     73	cp style.css "$HOME/.sfeed/style.css"
     74	sfeed_html $HOME/.sfeed/feeds/* > "$HOME/.sfeed/feeds.html"
     75
     76HTML view with the menu as frames, copy style.css for a default style:
     77
     78	mkdir -p "$HOME/.sfeed/frames"
     79	cp style.css "$HOME/.sfeed/frames/style.css"
     80	cd "$HOME/.sfeed/frames" && sfeed_frames $HOME/.sfeed/feeds/*
     81
     82To automatically update your feeds periodically and format them in a way you
     83like you can make a wrapper script and add it as a cronjob.
     84
     85Most protocols are supported because curl(1) is used by default and also proxy
     86settings from the environment (such as the $http_proxy environment variable)
     87are used.
     88
     89The sfeed(1) program itself is just a parser that parses XML data from stdin
     90and is therefore network protocol-agnostic. It can be used with HTTP, HTTPS,
     91Gopher, SSH, etc.
     92
     93See the section "Usage and examples" below and the man-pages for more
     94information how to use sfeed(1) and the additional tools.
     95
     96
     97Dependencies
     98------------
     99
    100- C compiler (C99).
    101- libc (recommended: C99 and POSIX >= 200809).
    102
    103
    104Optional dependencies
    105---------------------
    106
    107- POSIX make(1) for the Makefile.
    108- POSIX sh(1),
    109  used by sfeed_update(1) and sfeed_opml_export(1).
    110- POSIX utilities such as awk(1) and sort(1),
    111  used by sfeed_content(1), sfeed_markread(1), sfeed_opml_export(1) and
    112  sfeed_update(1).
    113- curl(1) binary: https://curl.haxx.se/ ,
    114  used by sfeed_update(1), but can be replaced with any tool like wget(1),
    115  OpenBSD ftp(1) or hurl(1): https://git.codemadness.org/hurl/
    116- iconv(1) command-line utilities,
    117  used by sfeed_update(1). If the text in your RSS/Atom feeds are already UTF-8
    118  encoded then you don't need this. For a minimal iconv implementation:
    119  https://git.etalabs.net/cgit/noxcuse/tree/src/iconv.c
    120- xargs with support for the -P and -0 option,
    121  used by sfeed_update(1).
    122- mandoc for documentation: https://mdocml.bsd.lv/
    123- curses (typically ncurses), otherwise see minicurses.h,
    124  used by sfeed_curses(1).
    125- a terminal (emulator) supporting UTF-8 and the used capabilities,
    126  used by sfeed_curses(1).
    127
    128
    129Optional run-time dependencies for sfeed_curses
    130-----------------------------------------------
    131
    132- xclip for yanking the URL or enclosure. See $SFEED_YANKER to change it.
    133- xdg-open, used as a plumber by default. See $SFEED_PLUMBER to change it.
    134- awk, used by the sfeed_content and sfeed_markread script.
    135  See the ENVIRONMENT VARIABLES section in the man page to change it.
    136- lynx, used by the sfeed_content script to convert HTML content.
    137  See the ENVIRONMENT VARIABLES section in the man page to change it.
    138
    139
    140Formats supported
    141-----------------
    142
    143sfeed supports a subset of XML 1.0 and a subset of:
    144
    145- Atom 1.0 (RFC 4287): https://datatracker.ietf.org/doc/html/rfc4287
    146- Atom 0.3 (draft, historic).
    147- RSS 0.90+.
    148- RDF (when used with RSS).
    149- MediaRSS extensions (media:).
    150- Dublin Core extensions (dc:).
    151
    152Other formats like JSON Feed, twtxt or certain RSS/Atom extensions are
    153supported by converting them to RSS/Atom or to the sfeed(5) format directly.
    154
    155
    156OS tested
    157---------
    158
    159- Linux,
    160  compilers: clang, gcc, chibicc, cproc, lacc, pcc, scc, tcc,
    161  libc: glibc, musl.
    162- OpenBSD (clang, gcc).
    163- NetBSD (with NetBSD curses).
    164- FreeBSD
    165- DragonFlyBSD
    166- GNU/Hurd
    167- Illumos (OpenIndiana).
    168- Windows (cygwin gcc + mintty, mingw).
    169- HaikuOS
    170- SerenityOS
    171- FreeDOS (djgpp, Open Watcom).
    172- FUZIX (sdcc -mz80, with the sfeed parser program).
    173
    174
    175Architectures tested
    176--------------------
    177
    178amd64, ARM, aarch64, HPPA, i386, MIPS32-BE, RISCV64, SPARC64, Z80.
    179
    180
    181Files
    182-----
    183
    184sfeed             - Read XML RSS or Atom feed data from stdin. Write feed data
    185                    in TAB-separated format to stdout.
    186sfeed_atom        - Format feed data (TSV) to an Atom feed.
    187sfeed_content     - View item content, for use with sfeed_curses.
    188sfeed_curses      - Format feed data (TSV) to a curses interface.
    189sfeed_frames      - Format feed data (TSV) to HTML file(s) with frames.
    190sfeed_gopher      - Format feed data (TSV) to Gopher files.
    191sfeed_html        - Format feed data (TSV) to HTML.
    192sfeed_json        - Format feed data (TSV) to JSON Feed.
    193sfeed_opml_export - Generate an OPML XML file from a sfeedrc config file.
    194sfeed_opml_import - Generate a sfeedrc config file from an OPML XML file.
    195sfeed_markread    - Mark items as read/unread, for use with sfeed_curses.
    196sfeed_mbox        - Format feed data (TSV) to mbox.
    197sfeed_plain       - Format feed data (TSV) to a plain-text list.
    198sfeed_twtxt       - Format feed data (TSV) to a twtxt feed.
    199sfeed_update      - Update feeds and merge items.
    200sfeed_web         - Find URLs to RSS/Atom feed from a webpage.
    201sfeed_xmlenc      - Detect character-set encoding from a XML stream.
    202sfeedrc.example   - Example config file. Can be copied to $HOME/.sfeed/sfeedrc.
    203style.css         - Example stylesheet to use with sfeed_html(1) and
    204                    sfeed_frames(1).
    205
    206
    207Files read at runtime by sfeed_update(1)
    208----------------------------------------
    209
    210sfeedrc - Config file. This file is evaluated as a shellscript in
    211          sfeed_update(1).
    212
    213At least the following functions can be overridden per feed:
    214
    215- fetch: to use wget(1), OpenBSD ftp(1) or an other download program.
    216- filter: to filter on fields.
    217- merge: to change the merge logic.
    218- order: to change the sort order.
    219
    220See also the sfeedrc(5) man page documentation for more details.
    221
    222The feeds() function is called to process the feeds. The default feed()
    223function is executed concurrently as a background job in your sfeedrc(5) config
    224file to make updating faster. The variable maxjobs can be changed to limit or
    225increase the amount of concurrent jobs (8 by default).
    226
    227
    228Files written at runtime by sfeed_update(1)
    229-------------------------------------------
    230
    231feedname     - TAB-separated format containing all items per feed. The
    232               sfeed_update(1) script merges new items with this file.
    233               The format is documented in sfeed(5).
    234
    235
    236File format
    237-----------
    238
    239man 5 sfeed
    240man 5 sfeedrc
    241man 1 sfeed
    242
    243
    244Usage and examples
    245------------------
    246
    247Find RSS/Atom feed URLs from a webpage:
    248
    249	url="https://codemadness.org"; curl -L -s "$url" | sfeed_web "$url"
    250
    251output example:
    252
    253	https://codemadness.org/atom.xml	application/atom+xml
    254	https://codemadness.org/atom_content.xml	application/atom+xml
    255
    256- - -
    257
    258Make sure your sfeedrc config file exists, see the sfeedrc.example file. To
    259update your feeds (configfile argument is optional):
    260
    261	sfeed_update "configfile"
    262
    263Format the feeds files:
    264
    265	# Plain-text list.
    266	sfeed_plain $HOME/.sfeed/feeds/* > $HOME/.sfeed/feeds.txt
    267	# HTML view (no frames), copy style.css for a default style.
    268	sfeed_html $HOME/.sfeed/feeds/* > $HOME/.sfeed/feeds.html
    269	# HTML view with the menu as frames, copy style.css for a default style.
    270	mkdir -p somedir && cd somedir && sfeed_frames $HOME/.sfeed/feeds/*
    271
    272View formatted output in your browser:
    273
    274	$BROWSER "$HOME/.sfeed/feeds.html"
    275
    276View formatted output in your editor:
    277
    278	$EDITOR "$HOME/.sfeed/feeds.txt"
    279
    280- - -
    281
    282View formatted output in a curses interface.  The interface has a look inspired
    283by the mutt mail client.  It has a sidebar panel for the feeds, a panel with a
    284listing of the items and a small statusbar for the selected item/URL. Some
    285functions like searching and scrolling are integrated in the interface itself.
    286
    287Just like the other format programs included in sfeed you can run it like this:
    288
    289	sfeed_curses ~/.sfeed/feeds/*
    290
    291... or by reading from stdin:
    292
    293	sfeed_curses < ~/.sfeed/feeds/xkcd
    294
    295By default sfeed_curses marks the items of the last day as new/bold. This limit
    296might be overridden by setting the environment variable $SFEED_NEW_AGE to the
    297desired maximum in seconds. To manage read/unread items in a different way a
    298plain-text file with a list of the read URLs can be used. To enable this
    299behaviour the path to this file can be specified by setting the environment
    300variable $SFEED_URL_FILE to the URL file:
    301
    302	export SFEED_URL_FILE="$HOME/.sfeed/urls"
    303	[ -f "$SFEED_URL_FILE" ] || touch "$SFEED_URL_FILE"
    304	sfeed_curses ~/.sfeed/feeds/*
    305
    306It then uses the shellscript "sfeed_markread" to process the read and unread
    307items.
    308
    309- - -
    310
    311Example script to view feed items in a vertical list/menu in dmenu(1). It opens
    312the selected URL in the browser set in $BROWSER:
    313
    314	#!/bin/sh
    315	url=$(sfeed_plain "$HOME/.sfeed/feeds/"* | dmenu -l 35 -i | \
    316		sed -n 's@^.* \([a-zA-Z]*://\)\(.*\)$@\1\2@p')
    317	test -n "${url}" && $BROWSER "${url}"
    318
    319dmenu can be found at: https://git.suckless.org/dmenu/
    320
    321- - -
    322
    323Generate a sfeedrc config file from your exported list of feeds in OPML
    324format:
    325
    326	sfeed_opml_import < opmlfile.xml > $HOME/.sfeed/sfeedrc
    327
    328- - -
    329
    330Export an OPML file of your feeds from a sfeedrc config file (configfile
    331argument is optional):
    332
    333	sfeed_opml_export configfile > myfeeds.opml
    334
    335- - -
    336
    337The filter function can be overridden in your sfeedrc file. This allows
    338filtering items per feed. It can be used to shorten URLs, filter away
    339advertisements, strip tracking parameters and more.
    340
    341	# filter fields.
    342	# filter(name, url)
    343	filter() {
    344		case "$1" in
    345		"tweakers")
    346			awk -F '\t' 'BEGIN { OFS = "\t"; }
    347			# skip ads.
    348			$2 ~ /^ADV:/ {
    349				next;
    350			}
    351			# shorten link.
    352			{
    353				if (match($3, /^https:\/\/tweakers\.net\/[a-z]+\/[0-9]+\//)) {
    354					$3 = substr($3, RSTART, RLENGTH);
    355				}
    356				print $0;
    357			}';;
    358		"yt BSDNow")
    359			# filter only BSD Now from channel.
    360			awk -F '\t' '$2 ~ / \| BSD Now/';;
    361		*)
    362			cat;;
    363		esac | \
    364			# replace youtube links with embed links.
    365			sed 's@www.youtube.com/watch?v=@www.youtube.com/embed/@g' | \
    366
    367			awk -F '\t' 'BEGIN { OFS = "\t"; }
    368			function filterlink(s) {
    369				# protocol must start with http, https or gopher.
    370				if (match(s, /^(http|https|gopher):\/\//) == 0) {
    371					return "";
    372				}
    373
    374				# shorten feedburner links.
    375				if (match(s, /^(http|https):\/\/[^\/]+\/~r\/.*\/~3\/[^\/]+\//)) {
    376					s = substr($3, RSTART, RLENGTH);
    377				}
    378
    379				# strip tracking parameters
    380				# urchin, facebook, piwik, webtrekk and generic.
    381				gsub(/\?(ad|campaign|fbclid|pk|tm|utm|wt)_([^&]+)/, "?", s);
    382				gsub(/&(ad|campaign|fbclid|pk|tm|utm|wt)_([^&]+)/, "", s);
    383
    384				gsub(/\?&/, "?", s);
    385				gsub(/[\?&]+$/, "", s);
    386
    387				return s
    388			}
    389			{
    390				$3 = filterlink($3); # link
    391				$8 = filterlink($8); # enclosure
    392
    393				# try to remove tracking pixels: <img/> tags with 1px width or height.
    394				gsub("<img[^>]*(width|height)[[:space:]]*=[[:space:]]*[\"'"'"' ]?1[\"'"'"' ]?[^0-9>]+[^>]*>", "", $4);
    395
    396				print $0;
    397			}'
    398	}
    399
    400- - -
    401
    402Aggregate feeds. This filters new entries (maximum one day old) and sorts them
    403by newest first. Prefix the feed name in the title. Convert the TSV output data
    404to an Atom XML feed (again):
    405
    406	#!/bin/sh
    407	cd ~/.sfeed/feeds/ || exit 1
    408
    409	awk -F '\t' -v "old=$(($(date +'%s') - 86400))" '
    410	BEGIN {	OFS = "\t"; }
    411	int($1) >= old {
    412		$2 = "[" FILENAME "] " $2;
    413		print $0;
    414	}' * | \
    415	sort -k1,1rn | \
    416	sfeed_atom
    417
    418- - -
    419
    420To have a "tail(1) -f"-like FIFO stream filtering for new unique feed items and
    421showing them as plain-text per line similar to sfeed_plain(1):
    422
    423Create a FIFO:
    424
    425	fifo="/tmp/sfeed_fifo"
    426	mkfifo "$fifo"
    427
    428On the reading side:
    429
    430	# This keeps track of unique lines so might consume much memory.
    431	# It tries to reopen the $fifo after 1 second if it fails.
    432	while :; do cat "$fifo" || sleep 1; done | awk '!x[$0]++'
    433
    434On the writing side:
    435
    436	feedsdir="$HOME/.sfeed/feeds/"
    437	cd "$feedsdir" || exit 1
    438	test -p "$fifo" || exit 1
    439
    440	# 1 day is old news, don't write older items.
    441	awk -F '\t' -v "old=$(($(date +'%s') - 86400))" '
    442	BEGIN { OFS = "\t"; }
    443	int($1) >= old {
    444		$2 = "[" FILENAME "] " $2;
    445		print $0;
    446	}' * | sort -k1,1n | sfeed_plain | cut -b 3- > "$fifo"
    447
    448cut -b is used to trim the "N " prefix of sfeed_plain(1).
    449
    450- - -
    451
    452For some podcast feed the following code can be used to filter the latest
    453enclosure URL (probably some audio file):
    454
    455	awk -F '\t' 'BEGIN { latest = 0; }
    456	length($8) {
    457		ts = int($1);
    458		if (ts > latest) {
    459			url = $8;
    460			latest = ts;
    461		}
    462	}
    463	END { if (length(url)) { print url; } }'
    464
    465... or on a file already sorted from newest to oldest:
    466
    467	awk -F '\t' '$8 { print $8; exit }'
    468
    469- - -
    470
    471Over time your feeds file might become quite big. You can archive items of a
    472feed from (roughly) the last week by doing for example:
    473
    474	awk -F '\t' -v "old=$(($(date +'%s') - 604800))" 'int($1) > old' < feed > feed.new
    475	mv feed feed.bak
    476	mv feed.new feed
    477
    478This could also be run weekly in a crontab to archive the feeds. Like throwing
    479away old newspapers. It keeps the feeds list tidy and the formatted output
    480small.
    481
    482- - -
    483
    484Convert mbox to separate maildirs per feed and filter duplicate messages using the
    485fdm program.
    486fdm is available at: https://github.com/nicm/fdm
    487
    488fdm config file (~/.sfeed/fdm.conf):
    489
    490	set unmatched-mail keep
    491
    492	account "sfeed" mbox "%[home]/.sfeed/mbox"
    493		$cachepath = "%[home]/.sfeed/fdm.cache"
    494		cache "${cachepath}"
    495		$maildir = "%[home]/feeds/"
    496
    497		# Check if message is in the cache by Message-ID.
    498		match case "^Message-ID: (.*)" in headers
    499			action {
    500				tag "msgid" value "%1"
    501			}
    502			continue
    503
    504		# If it is in the cache, stop.
    505		match matched and in-cache "${cachepath}" key "%[msgid]"
    506			action {
    507				keep
    508			}
    509
    510		# Not in the cache, process it and add to cache.
    511		match case "^X-Feedname: (.*)" in headers
    512			action {
    513				# Store to local maildir.
    514				maildir "${maildir}%1"
    515
    516				add-to-cache "${cachepath}" key "%[msgid]"
    517				keep
    518			}
    519
    520Now run:
    521
    522	$ sfeed_mbox ~/.sfeed/feeds/* > ~/.sfeed/mbox
    523	$ fdm -f ~/.sfeed/fdm.conf fetch
    524
    525Now you can view feeds in mutt(1) for example.
    526
    527- - -
    528
    529Read from mbox and filter duplicate messages using the fdm program and deliver
    530it to a SMTP server. This works similar to the rss2email program.
    531fdm is available at: https://github.com/nicm/fdm
    532
    533fdm config file (~/.sfeed/fdm.conf):
    534
    535	set unmatched-mail keep
    536
    537	account "sfeed" mbox "%[home]/.sfeed/mbox"
    538		$cachepath = "%[home]/.sfeed/fdm.cache"
    539		cache "${cachepath}"
    540
    541		# Check if message is in the cache by Message-ID.
    542		match case "^Message-ID: (.*)" in headers
    543			action {
    544				tag "msgid" value "%1"
    545			}
    546			continue
    547
    548		# If it is in the cache, stop.
    549		match matched and in-cache "${cachepath}" key "%[msgid]"
    550			action {
    551				keep
    552			}
    553
    554		# Not in the cache, process it and add to cache.
    555		match case "^X-Feedname: (.*)" in headers
    556			action {
    557				# Connect to a SMTP server and attempt to deliver the
    558				# mail to it.
    559				# Of course change the server and e-mail below.
    560				smtp server "codemadness.org" to "hiltjo@codemadness.org"
    561
    562				add-to-cache "${cachepath}" key "%[msgid]"
    563				keep
    564			}
    565
    566Now run:
    567
    568	$ sfeed_mbox ~/.sfeed/feeds/* > ~/.sfeed/mbox
    569	$ fdm -f ~/.sfeed/fdm.conf fetch
    570
    571Now you can view feeds in mutt(1) for example.
    572
    573- - -
    574
    575Convert mbox to separate maildirs per feed and filter duplicate messages using
    576procmail(1).
    577
    578procmail_maildirs.sh file:
    579
    580	maildir="$HOME/feeds"
    581	feedsdir="$HOME/.sfeed/feeds"
    582	procmailconfig="$HOME/.sfeed/procmailrc"
    583
    584	# message-id cache to prevent duplicates.
    585	mkdir -p "${maildir}/.cache"
    586
    587	if ! test -r "${procmailconfig}"; then
    588		printf "Procmail configuration file \"%s\" does not exist or is not readable.\n" "${procmailconfig}" >&2
    589		echo "See procmailrc.example for an example." >&2
    590		exit 1
    591	fi
    592
    593	find "${feedsdir}" -type f -exec printf '%s\n' {} \; | while read -r d; do
    594		name=$(basename "${d}")
    595		mkdir -p "${maildir}/${name}/cur"
    596		mkdir -p "${maildir}/${name}/new"
    597		mkdir -p "${maildir}/${name}/tmp"
    598		printf 'Mailbox %s\n' "${name}"
    599		sfeed_mbox "${d}" | formail -s procmail "${procmailconfig}"
    600	done
    601
    602Procmailrc(5) file:
    603
    604	# Example for use with sfeed_mbox(1).
    605	# The header X-Feedname is used to split into separate maildirs. It is
    606	# assumed this name is sane.
    607
    608	MAILDIR="$HOME/feeds/"
    609
    610	:0
    611	* ^X-Feedname: \/.*
    612	{
    613		FEED="$MATCH"
    614
    615		:0 Wh: "msgid_$FEED.lock"
    616		| formail -D 1024000 ".cache/msgid_$FEED.cache"
    617
    618		:0
    619		"$FEED"/
    620	}
    621
    622Now run:
    623
    624	$ procmail_maildirs.sh
    625
    626Now you can view feeds in mutt(1) for example.
    627
    628- - -
    629
    630The fetch function can be overridden in your sfeedrc file. This allows to
    631replace the default curl(1) for sfeed_update with any other client to fetch the
    632RSS/Atom data or change the default curl options:
    633
    634	# fetch a feed via HTTP/HTTPS etc.
    635	# fetch(name, url, feedfile)
    636	fetch() {
    637		hurl -m 1048576 -t 15 "$2" 2>/dev/null
    638	}
    639
    640- - -
    641
    642Caching, incremental data updates and bandwidth-saving
    643
    644For servers that support it some incremental updates and bandwidth-saving can
    645be done by using the "ETag" HTTP header.
    646
    647Create a directory for storing the ETags per feed:
    648
    649	mkdir -p ~/.sfeed/etags/
    650
    651The curl ETag options (--etag-save and --etag-compare) can be used to store and
    652send the previous ETag header value. curl version 7.73+ is recommended for it
    653to work properly.
    654
    655The curl -z option can be used to send the modification date of a local file as
    656a HTTP "If-Modified-Since" request header. The server can then respond if the
    657data is modified or not or respond with only the incremental data.
    658
    659The curl --compressed option can be used to indicate the client supports
    660decompression. Because RSS/Atom feeds are textual XML content this generally
    661compresses very well.
    662
    663These options can be set by overriding the fetch() function in the sfeedrc
    664file:
    665
    666	# fetch(name, url, feedfile)
    667	fetch() {
    668		etag="$HOME/.sfeed/etags/$(basename "$3")"
    669		curl \
    670			-L --max-redirs 0 -H "User-Agent:" -f -s -m 15 \
    671			--compressed \
    672			--etag-save "${etag}" --etag-compare "${etag}" \
    673			-z "${etag}" \
    674			"$2" 2>/dev/null
    675	}
    676
    677These options can come at a cost of some privacy, because it exposes
    678additional metadata from the previous request.
    679
    680- - -
    681
    682CDNs blocking requests due to a missing HTTP User-Agent request header
    683
    684sfeed_update will not send the "User-Agent" header by default for privacy
    685reasons.  Some CDNs like Cloudflare or websites like Reddit.com don't like this
    686and will block such HTTP requests.
    687
    688A custom User-Agent can be set by using the curl -H option, like so:
    689
    690	curl -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0'
    691
    692The above example string pretends to be a Windows 10 (x86-64) machine running
    693Firefox 78.
    694
    695- - -
    696
    697Page redirects
    698
    699For security and efficiency reasons by default redirects are not allowed and
    700are treated as an error.
    701
    702For example to prevent hijacking an unencrypted http:// to https:// redirect or
    703to not add time of an unnecessary page redirect each time.  It is encouraged to
    704use the final redirected URL in the sfeedrc config file.
    705
    706If you want to ignore this advise you can override the fetch() function in the
    707sfeedrc file and change the curl options "-L --max-redirs 0".
    708
    709- - -
    710
    711Shellscript to handle URLs and enclosures in parallel using xargs -P.
    712
    713This can be used to download and process URLs for downloading podcasts,
    714webcomics, download and convert webpages, mirror videos, etc. It uses a
    715plain-text cache file for remembering processed URLs. The match patterns are
    716defined in the shellscript fetch() function and in the awk script and can be
    717modified to handle items differently depending on their context.
    718
    719The arguments for the script are files in the sfeed(5) format. If no file
    720arguments are specified then the data is read from stdin.
    721
    722	#!/bin/sh
    723	# sfeed_download: downloader for URLs and enclosures in sfeed(5) files.
    724	# Dependencies: awk, curl, flock, xargs (-P), yt-dlp.
    725	
    726	cachefile="${SFEED_CACHEFILE:-$HOME/.sfeed/downloaded_urls}"
    727	jobs="${SFEED_JOBS:-4}"
    728	lockfile="${HOME}/.sfeed/sfeed_download.lock"
    729	
    730	# log(feedname, s, status)
    731	log() {
    732		if [ "$1" != "-" ]; then
    733			s="[$1] $2"
    734		else
    735			s="$2"
    736		fi
    737		printf '[%s]: %s: %s\n' "$(date +'%H:%M:%S')" "${s}" "$3"
    738	}
    739	
    740	# fetch(url, feedname)
    741	fetch() {
    742		case "$1" in
    743		*youtube.com*)
    744			yt-dlp "$1";;
    745		*.flac|*.ogg|*.m3u|*.m3u8|*.m4a|*.mkv|*.mp3|*.mp4|*.wav|*.webm)
    746			# allow 2 redirects, hide User-Agent, connect timeout is 15 seconds.
    747			curl -O -L --max-redirs 2 -H "User-Agent:" -f -s --connect-timeout 15 "$1";;
    748		esac
    749	}
    750	
    751	# downloader(url, title, feedname)
    752	downloader() {
    753		url="$1"
    754		title="$2"
    755		feedname="${3##*/}"
    756	
    757		msg="${title}: ${url}"
    758	
    759		# download directory.
    760		if [ "${feedname}" != "-" ]; then
    761			mkdir -p "${feedname}"
    762			if ! cd "${feedname}"; then
    763				log "${feedname}" "${msg}: ${feedname}" "DIR FAIL" >&2
    764				return 1
    765			fi
    766		fi
    767	
    768		log "${feedname}" "${msg}" "START"
    769		if fetch "${url}" "${feedname}"; then
    770			log "${feedname}" "${msg}" "OK"
    771	
    772			# append it safely in parallel to the cachefile on a
    773			# successful download.
    774			(flock 9 || exit 1
    775			printf '%s\n' "${url}" >> "${cachefile}"
    776			) 9>"${lockfile}"
    777		else
    778			log "${feedname}" "${msg}" "FAIL" >&2
    779			return 1
    780		fi
    781		return 0
    782	}
    783	
    784	if [ "${SFEED_DOWNLOAD_CHILD}" = "1" ]; then
    785		# Downloader helper for parallel downloading.
    786		# Receives arguments: $1 = URL, $2 = title, $3 = feed filename or "-".
    787		# It should write the URI to the cachefile if it is successful.
    788		downloader "$1" "$2" "$3"
    789		exit $?
    790	fi
    791	
    792	# ...else parent mode:
    793	
    794	tmp="$(mktemp)" || exit 1
    795	trap "rm -f ${tmp}" EXIT
    796	
    797	[ -f "${cachefile}" ] || touch "${cachefile}"
    798	cat "${cachefile}" > "${tmp}"
    799	echo >> "${tmp}" # force it to have one line for awk.
    800	
    801	LC_ALL=C awk -F '\t' '
    802	# fast prefilter what to download or not.
    803	function filter(url, field, feedname) {
    804		u = tolower(url);
    805		return (match(u, "youtube\\.com") ||
    806		        match(u, "\\.(flac|ogg|m3u|m3u8|m4a|mkv|mp3|mp4|wav|webm)$"));
    807	}
    808	function download(url, field, title, filename) {
    809		if (!length(url) || urls[url] || !filter(url, field, filename))
    810			return;
    811		# NUL-separated for xargs -0.
    812		printf("%s%c%s%c%s%c", url, 0, title, 0, filename, 0);
    813		urls[url] = 1; # print once
    814	}
    815	{
    816		FILENR += (FNR == 1);
    817	}
    818	# lookup table from cachefile which contains downloaded URLs.
    819	FILENR == 1 {
    820		urls[$0] = 1;
    821	}
    822	# feed file(s).
    823	FILENR != 1 {
    824		download($3, 3, $2, FILENAME); # link
    825		download($8, 8, $2, FILENAME); # enclosure
    826	}
    827	' "${tmp}" "${@:--}" | \
    828	SFEED_DOWNLOAD_CHILD="1" xargs -r -0 -L 3 -P "${jobs}" "$(readlink -f "$0")"
    829
    830- - -
    831
    832Shellscript to export existing newsboat cached items from sqlite3 to the sfeed
    833TSV format.
    834
    835	#!/bin/sh
    836	# Export newsbeuter/newsboat cached items from sqlite3 to the sfeed TSV format.
    837	# The data is split per file per feed with the name of the newsboat title/url.
    838	# It writes the URLs of the read items line by line to a "urls" file.
    839	#
    840	# Dependencies: sqlite3, awk.
    841	#
    842	# Usage: create some directory to store the feeds then run this script.
    843	
    844	# newsboat cache.db file.
    845	cachefile="$HOME/.newsboat/cache.db"
    846	test -n "$1" && cachefile="$1"
    847	
    848	# dump data.
    849	# .mode ascii: Columns/rows delimited by 0x1F and 0x1E
    850	# get the first fields in the order of the sfeed(5) format.
    851	sqlite3 "$cachefile" <<!EOF |
    852	.headers off
    853	.mode ascii
    854	.output
    855	SELECT
    856		i.pubDate, i.title, i.url, i.content, i.content_mime_type,
    857		i.guid, i.author, i.enclosure_url,
    858		f.rssurl AS rssurl, f.title AS feedtitle, i.unread
    859		-- i.id, i.enclosure_type, i.enqueued, i.flags, i.deleted, i.base
    860	FROM rss_feed f
    861	INNER JOIN rss_item i ON i.feedurl = f.rssurl
    862	ORDER BY
    863		i.feedurl ASC, i.pubDate DESC;
    864	.quit
    865	!EOF
    866	# convert to sfeed(5) TSV format.
    867	LC_ALL=C awk '
    868	BEGIN {
    869		FS = "\x1f";
    870		RS = "\x1e";
    871	}
    872	# normal non-content fields.
    873	function field(s) {
    874		gsub("^[[:space:]]*", "", s);
    875		gsub("[[:space:]]*$", "", s);
    876		gsub("[[:space:]]", " ", s);
    877		gsub("[[:cntrl:]]", "", s);
    878		return s;
    879	}
    880	# content field.
    881	function content(s) {
    882		gsub("^[[:space:]]*", "", s);
    883		gsub("[[:space:]]*$", "", s);
    884		# escape chars in content field.
    885		gsub("\\\\", "\\\\", s);
    886		gsub("\n", "\\n", s);
    887		gsub("\t", "\\t", s);
    888		return s;
    889	}
    890	function feedname(feedurl, feedtitle) {
    891		if (feedtitle == "") {
    892			gsub("/", "_", feedurl);
    893			return feedurl;
    894		}
    895		gsub("/", "_", feedtitle);
    896		return feedtitle;
    897	}
    898	{
    899		fname = feedname($9, $10);
    900		if (!feed[fname]++) {
    901			print "Writing file: \"" fname "\" (title: " $10 ", url: " $9 ")" > "/dev/stderr";
    902		}
    903	
    904		contenttype = field($5);
    905		if (contenttype == "")
    906			contenttype = "html";
    907		else if (index(contenttype, "/html") || index(contenttype, "/xhtml"))
    908			contenttype = "html";
    909		else
    910			contenttype = "plain";
    911	
    912		print $1 "\t" field($2) "\t" field($3) "\t" content($4) "\t" \
    913			contenttype "\t" field($6) "\t" field($7) "\t" field($8) "\t" \
    914			> fname;
    915	
    916		# write URLs of the read items to a file line by line.
    917		if ($11 == "0") {
    918			print $3 > "urls";
    919		}
    920	}'
    921
    922- - -
    923
    924Progress indicator
    925------------------
    926
    927The below sfeed_update wrapper script counts the amount of feeds in a sfeedrc
    928config.  It then calls sfeed_update and pipes the output lines to a function
    929that counts the current progress. It writes the total progress to stderr.
    930Alternative: pv -l -s totallines
    931
    932	#!/bin/sh
    933	# Progress indicator script.
    934	
    935	# Pass lines as input to stdin and write progress status to stderr.
    936	# progress(totallines)
    937	progress() {
    938		total="$(($1 + 0))" # must be a number, no divide by zero.
    939		test "${total}" -le 0 -o "$1" != "${total}" && return
    940	LC_ALL=C awk -v "total=${total}" '
    941	{
    942		counter++;
    943		percent = (counter * 100) / total;
    944		printf("\033[K") > "/dev/stderr"; # clear EOL
    945		print $0;
    946		printf("[%s/%s] %.0f%%\r", counter, total, percent) > "/dev/stderr";
    947		fflush(); # flush all buffers per line.
    948	}
    949	END {
    950		printf("\033[K") > "/dev/stderr";
    951	}'
    952	}
    953	
    954	# Counts the feeds from the sfeedrc config.
    955	countfeeds() {
    956		count=0
    957	. "$1"
    958	feed() {
    959		count=$((count + 1))
    960	}
    961		feeds
    962		echo "${count}"
    963	}
    964	
    965	config="${1:-$HOME/.sfeed/sfeedrc}"
    966	total=$(countfeeds "${config}")
    967	sfeed_update "${config}" 2>&1 | progress "${total}"
    968
    969- - -
    970
    971Counting unread and total items
    972-------------------------------
    973
    974It can be useful to show the counts of unread items, for example in a
    975windowmanager or statusbar.
    976
    977The below example script counts the items of the last day in the same way the
    978formatting tools do:
    979
    980	#!/bin/sh
    981	# Count the new items of the last day.
    982	LC_ALL=C awk -F '\t' -v "old=$(($(date +'%s') - 86400))" '
    983	{
    984		total++;
    985	}
    986	int($1) >= old {
    987		totalnew++;
    988	}
    989	END {
    990		print "New:   " totalnew;
    991		print "Total: " total;
    992	}' ~/.sfeed/feeds/*
    993
    994The below example script counts the unread items using the sfeed_curses URL
    995file:
    996
    997	#!/bin/sh
    998	# Count the unread and total items from feeds using the URL file.
    999	LC_ALL=C awk -F '\t' '
   1000	# URL file: amount of fields is 1.
   1001	NF == 1 {
   1002		u[$0] = 1; # lookup table of URLs.
   1003		next;
   1004	}
   1005	# feed file: check by URL or id.
   1006	{
   1007		total++;
   1008		if (length($3)) {
   1009			if (u[$3])
   1010				read++;
   1011		} else if (length($6)) {
   1012			if (u[$6])
   1013				read++;
   1014		}
   1015	}
   1016	END {
   1017		print "Unread: " (total - read);
   1018		print "Total:  " total;
   1019	}' ~/.sfeed/urls ~/.sfeed/feeds/*
   1020
   1021- - -
   1022
   1023sfeed.c: adding new XML tags or sfeed(5) fields to the parser
   1024-------------------------------------------------------------
   1025
   1026sfeed.c contains definitions to parse XML tags and map them to sfeed(5) TSV
   1027fields. Parsed RSS and Atom tag names are first stored as a TagId, which is a
   1028number.  This TagId is then mapped to the output field index.
   1029
   1030Steps to modify the code:
   1031
   1032* Add a new TagId enum for the tag.
   1033
   1034* (optional) Add a new FeedField* enum for the new output field or you can map
   1035  it to an existing field.
   1036
   1037* Add the new XML tag name to the array variable of parsed RSS or Atom
   1038  tags: rsstags[] or atomtags[].
   1039
   1040  These must be defined in alphabetical order, because a binary search is used
   1041  which uses the strcasecmp() function.
   1042
   1043* Add the parsed TagId to the output field in the array variable fieldmap[].
   1044
   1045  When another tag is also mapped to the same output field then the tag with
   1046  the highest TagId number value overrides the mapped field: the order is from
   1047  least important to high.
   1048
   1049* If this defined tag is just using the inner data of the XML tag, then this
   1050  definition is enough. If it for example has to parse a certain attribute you
   1051  have to add a check for the TagId to the xmlattr() callback function.
   1052
   1053* (optional) Print the new field in the printfields() function.
   1054
   1055Below is a patch example to add the MRSS "media:content" tag as a new field:
   1056
   1057diff --git a/sfeed.c b/sfeed.c
   1058--- a/sfeed.c
   1059+++ b/sfeed.c
   1060@@ -50,7 +50,7 @@ enum TagId {
   1061 	RSSTagGuidPermalinkTrue,
   1062 	/* must be defined after GUID, because it can be a link (isPermaLink) */
   1063 	RSSTagLink,
   1064-	RSSTagEnclosure,
   1065+	RSSTagMediaContent, RSSTagEnclosure,
   1066 	RSSTagAuthor, RSSTagDccreator,
   1067 	RSSTagCategory,
   1068 	/* Atom */
   1069@@ -81,7 +81,7 @@ typedef struct field {
   1070 enum {
   1071 	FeedFieldTime = 0, FeedFieldTitle, FeedFieldLink, FeedFieldContent,
   1072 	FeedFieldId, FeedFieldAuthor, FeedFieldEnclosure, FeedFieldCategory,
   1073-	FeedFieldLast
   1074+	FeedFieldMediaContent, FeedFieldLast
   1075 };
   1076 
   1077 typedef struct feedcontext {
   1078@@ -137,6 +137,7 @@ static const FeedTag rsstags[] = {
   1079 	{ STRP("enclosure"),         RSSTagEnclosure         },
   1080 	{ STRP("guid"),              RSSTagGuid              },
   1081 	{ STRP("link"),              RSSTagLink              },
   1082+	{ STRP("media:content"),     RSSTagMediaContent      },
   1083 	{ STRP("media:description"), RSSTagMediaDescription  },
   1084 	{ STRP("pubdate"),           RSSTagPubdate           },
   1085 	{ STRP("title"),             RSSTagTitle             }
   1086@@ -180,6 +181,7 @@ static const int fieldmap[TagLast] = {
   1087 	[RSSTagGuidPermalinkFalse] = FeedFieldId,
   1088 	[RSSTagGuidPermalinkTrue]  = FeedFieldId, /* special-case: both a link and an id */
   1089 	[RSSTagLink]               = FeedFieldLink,
   1090+	[RSSTagMediaContent]       = FeedFieldMediaContent,
   1091 	[RSSTagEnclosure]          = FeedFieldEnclosure,
   1092 	[RSSTagAuthor]             = FeedFieldAuthor,
   1093 	[RSSTagDccreator]          = FeedFieldAuthor,
   1094@@ -677,6 +679,8 @@ printfields(void)
   1095 	string_print_uri(&ctx.fields[FeedFieldEnclosure].str);
   1096 	putchar(FieldSeparator);
   1097 	string_print_trimmed_multi(&ctx.fields[FeedFieldCategory].str);
   1098+	putchar(FieldSeparator);
   1099+	string_print_trimmed(&ctx.fields[FeedFieldMediaContent].str);
   1100 	putchar('\n');
   1101 
   1102 	if (ferror(stdout)) /* check for errors but do not flush */
   1103@@ -718,7 +722,7 @@ xmlattr(XMLParser *p, const char *t, size_t tl, const char *n, size_t nl,
   1104 	}
   1105 
   1106 	if (ctx.feedtype == FeedTypeRSS) {
   1107-		if (ctx.tag.id == RSSTagEnclosure &&
   1108+		if ((ctx.tag.id == RSSTagEnclosure || ctx.tag.id == RSSTagMediaContent) &&
   1109 		    isattr(n, nl, STRP("url"))) {
   1110 			string_append(&tmpstr, v, vl);
   1111 		} else if (ctx.tag.id == RSSTagGuid &&
   1112
   1113- - -
   1114
   1115Running custom commands inside the sfeed_curses program
   1116-------------------------------------------------------
   1117
   1118Running commands inside the sfeed_curses program can be useful for example to
   1119sync items or mark all items across all feeds as read. It can be comfortable to
   1120have a keybind for this inside the program to perform a scripted action and
   1121then reload the feeds by sending the signal SIGHUP.
   1122
   1123In the input handling code you can then add a case:
   1124
   1125	case 'M':
   1126		forkexec((char *[]) { "markallread.sh", NULL }, 0);
   1127		break;
   1128
   1129or
   1130
   1131	case 'S':
   1132		forkexec((char *[]) { "syncnews.sh", NULL }, 1);
   1133		break;
   1134
   1135The specified script should be in $PATH or be an absolute path.
   1136
   1137Example of a `markallread.sh` shellscript to mark all URLs as read:
   1138
   1139	#!/bin/sh
   1140	# mark all items/URLs as read.
   1141	tmp="$(mktemp)" || exit 1
   1142	(cat ~/.sfeed/urls; cut -f 3 ~/.sfeed/feeds/*) | \
   1143	awk '!x[$0]++' > "$tmp" &&
   1144	mv "$tmp" ~/.sfeed/urls &&
   1145	pkill -SIGHUP sfeed_curses # reload feeds.
   1146
   1147Example of a `syncnews.sh` shellscript to update the feeds and reload them:
   1148
   1149	#!/bin/sh
   1150	sfeed_update
   1151	pkill -SIGHUP sfeed_curses
   1152
   1153
   1154Running programs in a new session
   1155---------------------------------
   1156
   1157By default processes are spawned in the same session and process group as
   1158sfeed_curses.  When sfeed_curses is closed this can also close the spawned
   1159process in some cases.
   1160
   1161When the setsid command-line program is available the following wrapper command
   1162can be used to run the program in a new session, for a plumb program:
   1163
   1164	setsid -f xdg-open "$@"
   1165
   1166Alternatively the code can be changed to call setsid() before execvp().
   1167
   1168
   1169Open an URL directly in the same terminal
   1170-----------------------------------------
   1171
   1172To open an URL directly in the same terminal using the text-mode lynx browser:
   1173
   1174	SFEED_PLUMBER=lynx SFEED_PLUMBER_INTERACTIVE=1 sfeed_curses ~/.sfeed/feeds/*
   1175
   1176
   1177Yank to tmux buffer
   1178-------------------
   1179
   1180This changes the yank command to set the tmux buffer, instead of X11 xclip:
   1181
   1182	SFEED_YANKER="tmux set-buffer \`cat\`"
   1183
   1184
   1185Known terminal issues
   1186---------------------
   1187
   1188Below lists some bugs or missing features in terminals that are found while
   1189testing sfeed_curses.  Some of them might be fixed already upstream:
   1190
   1191- cygwin + mintty: the xterm mouse-encoding of the mouse position is broken for
   1192  scrolling.
   1193- HaikuOS terminal: the xterm mouse-encoding of the mouse button number of the
   1194  middle-button, right-button is incorrect / reversed.
   1195- putty: the full reset attribute (ESC c, typically `rs1`) does not reset the
   1196  window title.
   1197- Mouse button encoding for extended buttons (like side-buttons) in some
   1198  terminals are unsupported or map to the same button: for example side-buttons 7
   1199  and 8 map to the scroll buttons 4 and 5 in urxvt.
   1200
   1201
   1202License
   1203-------
   1204
   1205ISC, see LICENSE file.
   1206
   1207
   1208Author
   1209------
   1210
   1211Hiltjo Posthuma <hiltjo@codemadness.org>