README (34641B)
1sfeed 2----- 3 4RSS and Atom parser (and some format programs). 5 6It converts RSS or Atom feeds from XML to a TAB-separated file. There are 7formatting programs included to convert this TAB-separated format to various 8other formats. There are also some programs and scripts included to import and 9export OPML and to fetch, filter, merge and order feed items. 10 11 12Build and install 13----------------- 14 15$ make 16# make install 17 18 19To build sfeed without sfeed_curses set SFEED_CURSES to an empty string: 20 21$ make SFEED_CURSES="" 22# make SFEED_CURSES="" install 23 24 25To change the theme for sfeed_curses you can set SFEED_THEME. See the themes/ 26directory for the theme names. 27 28$ make SFEED_THEME="templeos" 29# make SFEED_THEME="templeos" install 30 31 32Usage 33----- 34 35Initial setup: 36 37 mkdir -p "$HOME/.sfeed/feeds" 38 cp sfeedrc.example "$HOME/.sfeed/sfeedrc" 39 40Edit the sfeedrc(5) configuration file and change any RSS/Atom feeds. This file 41is included and evaluated as a shellscript for sfeed_update, so its functions 42and behaviour can be overridden: 43 44 $EDITOR "$HOME/.sfeed/sfeedrc" 45 46or you can import existing OPML subscriptions using sfeed_opml_import(1): 47 48 sfeed_opml_import < file.opml > "$HOME/.sfeed/sfeedrc" 49 50an example to export from an other RSS/Atom reader called newsboat and import 51for sfeed_update: 52 53 newsboat -e | sfeed_opml_import > "$HOME/.sfeed/sfeedrc" 54 55an example to export from an other RSS/Atom reader called rss2email (3.x+) and 56import for sfeed_update: 57 58 r2e opmlexport | sfeed_opml_import > "$HOME/.sfeed/sfeedrc" 59 60Update feeds, this script merges the new items, see sfeed_update(1) for more 61information what it can do: 62 63 sfeed_update 64 65Format feeds: 66 67Plain-text list: 68 69 sfeed_plain $HOME/.sfeed/feeds/* > "$HOME/.sfeed/feeds.txt" 70 71HTML view (no frames), copy style.css for a default style: 72 73 cp style.css "$HOME/.sfeed/style.css" 74 sfeed_html $HOME/.sfeed/feeds/* > "$HOME/.sfeed/feeds.html" 75 76HTML view with the menu as frames, copy style.css for a default style: 77 78 mkdir -p "$HOME/.sfeed/frames" 79 cp style.css "$HOME/.sfeed/frames/style.css" 80 cd "$HOME/.sfeed/frames" && sfeed_frames $HOME/.sfeed/feeds/* 81 82To automatically update your feeds periodically and format them in a way you 83like you can make a wrapper script and add it as a cronjob. 84 85Most protocols are supported because curl(1) is used by default and also proxy 86settings from the environment (such as the $http_proxy environment variable) 87are used. 88 89The sfeed(1) program itself is just a parser that parses XML data from stdin 90and is therefore network protocol-agnostic. It can be used with HTTP, HTTPS, 91Gopher, SSH, etc. 92 93See the section "Usage and examples" below and the man-pages for more 94information how to use sfeed(1) and the additional tools. 95 96 97Dependencies 98------------ 99 100- C compiler (C99). 101- libc (recommended: C99 and POSIX >= 200809). 102 103 104Optional dependencies 105--------------------- 106 107- POSIX make(1) for the Makefile. 108- POSIX sh(1), 109 used by sfeed_update(1) and sfeed_opml_export(1). 110- POSIX utilities such as awk(1) and sort(1), 111 used by sfeed_content(1), sfeed_markread(1), sfeed_opml_export(1) and 112 sfeed_update(1). 113- curl(1) binary: https://curl.haxx.se/ , 114 used by sfeed_update(1), but can be replaced with any tool like wget(1), 115 OpenBSD ftp(1) or hurl(1): https://git.codemadness.org/hurl/ 116- iconv(1) command-line utilities, 117 used by sfeed_update(1). If the text in your RSS/Atom feeds are already UTF-8 118 encoded then you don't need this. For a minimal iconv implementation: 119 https://git.etalabs.net/cgit/noxcuse/tree/src/iconv.c 120- xargs with support for the -P and -0 option, 121 used by sfeed_update(1). 122- mandoc for documentation: https://mdocml.bsd.lv/ 123- curses (typically ncurses), otherwise see minicurses.h, 124 used by sfeed_curses(1). 125- a terminal (emulator) supporting UTF-8 and the used capabilities, 126 used by sfeed_curses(1). 127 128 129Optional run-time dependencies for sfeed_curses 130----------------------------------------------- 131 132- xclip for yanking the URL or enclosure. See $SFEED_YANKER to change it. 133- xdg-open, used as a plumber by default. See $SFEED_PLUMBER to change it. 134- awk, used by the sfeed_content and sfeed_markread script. 135 See the ENVIRONMENT VARIABLES section in the man page to change it. 136- lynx, used by the sfeed_content script to convert HTML content. 137 See the ENVIRONMENT VARIABLES section in the man page to change it. 138 139 140Formats supported 141----------------- 142 143sfeed supports a subset of XML 1.0 and a subset of: 144 145- Atom 1.0 (RFC 4287): https://datatracker.ietf.org/doc/html/rfc4287 146- Atom 0.3 (draft, historic). 147- RSS 0.90+. 148- RDF (when used with RSS). 149- MediaRSS extensions (media:). 150- Dublin Core extensions (dc:). 151 152Other formats like JSON Feed, twtxt or certain RSS/Atom extensions are 153supported by converting them to RSS/Atom or to the sfeed(5) format directly. 154 155 156OS tested 157--------- 158 159- Linux, 160 compilers: clang, gcc, chibicc, cproc, lacc, pcc, scc, tcc, 161 libc: glibc, musl. 162- OpenBSD (clang, gcc). 163- NetBSD (with NetBSD curses). 164- FreeBSD 165- DragonFlyBSD 166- GNU/Hurd 167- Illumos (OpenIndiana). 168- Windows (cygwin gcc + mintty, mingw). 169- HaikuOS 170- SerenityOS 171- FreeDOS (djgpp, Open Watcom). 172- FUZIX (sdcc -mz80, with the sfeed parser program). 173 174 175Architectures tested 176-------------------- 177 178amd64, ARM, aarch64, HPPA, i386, MIPS32-BE, RISCV64, SPARC64, Z80. 179 180 181Files 182----- 183 184sfeed - Read XML RSS or Atom feed data from stdin. Write feed data 185 in TAB-separated format to stdout. 186sfeed_atom - Format feed data (TSV) to an Atom feed. 187sfeed_content - View item content, for use with sfeed_curses. 188sfeed_curses - Format feed data (TSV) to a curses interface. 189sfeed_frames - Format feed data (TSV) to HTML file(s) with frames. 190sfeed_gopher - Format feed data (TSV) to Gopher files. 191sfeed_html - Format feed data (TSV) to HTML. 192sfeed_json - Format feed data (TSV) to JSON Feed. 193sfeed_opml_export - Generate an OPML XML file from a sfeedrc config file. 194sfeed_opml_import - Generate a sfeedrc config file from an OPML XML file. 195sfeed_markread - Mark items as read/unread, for use with sfeed_curses. 196sfeed_mbox - Format feed data (TSV) to mbox. 197sfeed_plain - Format feed data (TSV) to a plain-text list. 198sfeed_twtxt - Format feed data (TSV) to a twtxt feed. 199sfeed_update - Update feeds and merge items. 200sfeed_web - Find URLs to RSS/Atom feed from a webpage. 201sfeed_xmlenc - Detect character-set encoding from a XML stream. 202sfeedrc.example - Example config file. Can be copied to $HOME/.sfeed/sfeedrc. 203style.css - Example stylesheet to use with sfeed_html(1) and 204 sfeed_frames(1). 205 206 207Files read at runtime by sfeed_update(1) 208---------------------------------------- 209 210sfeedrc - Config file. This file is evaluated as a shellscript in 211 sfeed_update(1). 212 213At least the following functions can be overridden per feed: 214 215- fetch: to use wget(1), OpenBSD ftp(1) or an other download program. 216- filter: to filter on fields. 217- merge: to change the merge logic. 218- order: to change the sort order. 219 220See also the sfeedrc(5) man page documentation for more details. 221 222The feeds() function is called to process the feeds. The default feed() 223function is executed concurrently as a background job in your sfeedrc(5) config 224file to make updating faster. The variable maxjobs can be changed to limit or 225increase the amount of concurrent jobs (8 by default). 226 227 228Files written at runtime by sfeed_update(1) 229------------------------------------------- 230 231feedname - TAB-separated format containing all items per feed. The 232 sfeed_update(1) script merges new items with this file. 233 The format is documented in sfeed(5). 234 235 236File format 237----------- 238 239man 5 sfeed 240man 5 sfeedrc 241man 1 sfeed 242 243 244Usage and examples 245------------------ 246 247Find RSS/Atom feed URLs from a webpage: 248 249 url="https://codemadness.org"; curl -L -s "$url" | sfeed_web "$url" 250 251output example: 252 253 https://codemadness.org/atom.xml application/atom+xml 254 https://codemadness.org/atom_content.xml application/atom+xml 255 256- - - 257 258Make sure your sfeedrc config file exists, see the sfeedrc.example file. To 259update your feeds (configfile argument is optional): 260 261 sfeed_update "configfile" 262 263Format the feeds files: 264 265 # Plain-text list. 266 sfeed_plain $HOME/.sfeed/feeds/* > $HOME/.sfeed/feeds.txt 267 # HTML view (no frames), copy style.css for a default style. 268 sfeed_html $HOME/.sfeed/feeds/* > $HOME/.sfeed/feeds.html 269 # HTML view with the menu as frames, copy style.css for a default style. 270 mkdir -p somedir && cd somedir && sfeed_frames $HOME/.sfeed/feeds/* 271 272View formatted output in your browser: 273 274 $BROWSER "$HOME/.sfeed/feeds.html" 275 276View formatted output in your editor: 277 278 $EDITOR "$HOME/.sfeed/feeds.txt" 279 280- - - 281 282View formatted output in a curses interface. The interface has a look inspired 283by the mutt mail client. It has a sidebar panel for the feeds, a panel with a 284listing of the items and a small statusbar for the selected item/URL. Some 285functions like searching and scrolling are integrated in the interface itself. 286 287Just like the other format programs included in sfeed you can run it like this: 288 289 sfeed_curses ~/.sfeed/feeds/* 290 291... or by reading from stdin: 292 293 sfeed_curses < ~/.sfeed/feeds/xkcd 294 295By default sfeed_curses marks the items of the last day as new/bold. This limit 296might be overridden by setting the environment variable $SFEED_NEW_AGE to the 297desired maximum in seconds. To manage read/unread items in a different way a 298plain-text file with a list of the read URLs can be used. To enable this 299behaviour the path to this file can be specified by setting the environment 300variable $SFEED_URL_FILE to the URL file: 301 302 export SFEED_URL_FILE="$HOME/.sfeed/urls" 303 [ -f "$SFEED_URL_FILE" ] || touch "$SFEED_URL_FILE" 304 sfeed_curses ~/.sfeed/feeds/* 305 306It then uses the shellscript "sfeed_markread" to process the read and unread 307items. 308 309- - - 310 311Example script to view feed items in a vertical list/menu in dmenu(1). It opens 312the selected URL in the browser set in $BROWSER: 313 314 #!/bin/sh 315 url=$(sfeed_plain "$HOME/.sfeed/feeds/"* | dmenu -l 35 -i | \ 316 sed -n 's@^.* \([a-zA-Z]*://\)\(.*\)$@\1\2@p') 317 test -n "${url}" && $BROWSER "${url}" 318 319dmenu can be found at: https://git.suckless.org/dmenu/ 320 321- - - 322 323Generate a sfeedrc config file from your exported list of feeds in OPML 324format: 325 326 sfeed_opml_import < opmlfile.xml > $HOME/.sfeed/sfeedrc 327 328- - - 329 330Export an OPML file of your feeds from a sfeedrc config file (configfile 331argument is optional): 332 333 sfeed_opml_export configfile > myfeeds.opml 334 335- - - 336 337The filter function can be overridden in your sfeedrc file. This allows 338filtering items per feed. It can be used to shorten URLs, filter away 339advertisements, strip tracking parameters and more. 340 341 # filter fields. 342 # filter(name, url) 343 filter() { 344 case "$1" in 345 "tweakers") 346 awk -F '\t' 'BEGIN { OFS = "\t"; } 347 # skip ads. 348 $2 ~ /^ADV:/ { 349 next; 350 } 351 # shorten link. 352 { 353 if (match($3, /^https:\/\/tweakers\.net\/[a-z]+\/[0-9]+\//)) { 354 $3 = substr($3, RSTART, RLENGTH); 355 } 356 print $0; 357 }';; 358 "yt BSDNow") 359 # filter only BSD Now from channel. 360 awk -F '\t' '$2 ~ / \| BSD Now/';; 361 *) 362 cat;; 363 esac | \ 364 # replace youtube links with embed links. 365 sed 's@www.youtube.com/watch?v=@www.youtube.com/embed/@g' | \ 366 367 awk -F '\t' 'BEGIN { OFS = "\t"; } 368 function filterlink(s) { 369 # protocol must start with http, https or gopher. 370 if (match(s, /^(http|https|gopher):\/\//) == 0) { 371 return ""; 372 } 373 374 # shorten feedburner links. 375 if (match(s, /^(http|https):\/\/[^\/]+\/~r\/.*\/~3\/[^\/]+\//)) { 376 s = substr($3, RSTART, RLENGTH); 377 } 378 379 # strip tracking parameters 380 # urchin, facebook, piwik, webtrekk and generic. 381 gsub(/\?(ad|campaign|fbclid|pk|tm|utm|wt)_([^&]+)/, "?", s); 382 gsub(/&(ad|campaign|fbclid|pk|tm|utm|wt)_([^&]+)/, "", s); 383 384 gsub(/\?&/, "?", s); 385 gsub(/[\?&]+$/, "", s); 386 387 return s 388 } 389 { 390 $3 = filterlink($3); # link 391 $8 = filterlink($8); # enclosure 392 393 # try to remove tracking pixels: <img/> tags with 1px width or height. 394 gsub("<img[^>]*(width|height)[[:space:]]*=[[:space:]]*[\"'"'"' ]?1[\"'"'"' ]?[^0-9>]+[^>]*>", "", $4); 395 396 print $0; 397 }' 398 } 399 400- - - 401 402Aggregate feeds. This filters new entries (maximum one day old) and sorts them 403by newest first. Prefix the feed name in the title. Convert the TSV output data 404to an Atom XML feed (again): 405 406 #!/bin/sh 407 cd ~/.sfeed/feeds/ || exit 1 408 409 awk -F '\t' -v "old=$(($(date +'%s') - 86400))" ' 410 BEGIN { OFS = "\t"; } 411 int($1) >= old { 412 $2 = "[" FILENAME "] " $2; 413 print $0; 414 }' * | \ 415 sort -k1,1rn | \ 416 sfeed_atom 417 418- - - 419 420To have a "tail(1) -f"-like FIFO stream filtering for new unique feed items and 421showing them as plain-text per line similar to sfeed_plain(1): 422 423Create a FIFO: 424 425 fifo="/tmp/sfeed_fifo" 426 mkfifo "$fifo" 427 428On the reading side: 429 430 # This keeps track of unique lines so might consume much memory. 431 # It tries to reopen the $fifo after 1 second if it fails. 432 while :; do cat "$fifo" || sleep 1; done | awk '!x[$0]++' 433 434On the writing side: 435 436 feedsdir="$HOME/.sfeed/feeds/" 437 cd "$feedsdir" || exit 1 438 test -p "$fifo" || exit 1 439 440 # 1 day is old news, don't write older items. 441 awk -F '\t' -v "old=$(($(date +'%s') - 86400))" ' 442 BEGIN { OFS = "\t"; } 443 int($1) >= old { 444 $2 = "[" FILENAME "] " $2; 445 print $0; 446 }' * | sort -k1,1n | sfeed_plain | cut -b 3- > "$fifo" 447 448cut -b is used to trim the "N " prefix of sfeed_plain(1). 449 450- - - 451 452For some podcast feed the following code can be used to filter the latest 453enclosure URL (probably some audio file): 454 455 awk -F '\t' 'BEGIN { latest = 0; } 456 length($8) { 457 ts = int($1); 458 if (ts > latest) { 459 url = $8; 460 latest = ts; 461 } 462 } 463 END { if (length(url)) { print url; } }' 464 465... or on a file already sorted from newest to oldest: 466 467 awk -F '\t' '$8 { print $8; exit }' 468 469- - - 470 471Over time your feeds file might become quite big. You can archive items of a 472feed from (roughly) the last week by doing for example: 473 474 awk -F '\t' -v "old=$(($(date +'%s') - 604800))" 'int($1) > old' < feed > feed.new 475 mv feed feed.bak 476 mv feed.new feed 477 478This could also be run weekly in a crontab to archive the feeds. Like throwing 479away old newspapers. It keeps the feeds list tidy and the formatted output 480small. 481 482- - - 483 484Convert mbox to separate maildirs per feed and filter duplicate messages using the 485fdm program. 486fdm is available at: https://github.com/nicm/fdm 487 488fdm config file (~/.sfeed/fdm.conf): 489 490 set unmatched-mail keep 491 492 account "sfeed" mbox "%[home]/.sfeed/mbox" 493 $cachepath = "%[home]/.sfeed/fdm.cache" 494 cache "${cachepath}" 495 $maildir = "%[home]/feeds/" 496 497 # Check if message is in the cache by Message-ID. 498 match case "^Message-ID: (.*)" in headers 499 action { 500 tag "msgid" value "%1" 501 } 502 continue 503 504 # If it is in the cache, stop. 505 match matched and in-cache "${cachepath}" key "%[msgid]" 506 action { 507 keep 508 } 509 510 # Not in the cache, process it and add to cache. 511 match case "^X-Feedname: (.*)" in headers 512 action { 513 # Store to local maildir. 514 maildir "${maildir}%1" 515 516 add-to-cache "${cachepath}" key "%[msgid]" 517 keep 518 } 519 520Now run: 521 522 $ sfeed_mbox ~/.sfeed/feeds/* > ~/.sfeed/mbox 523 $ fdm -f ~/.sfeed/fdm.conf fetch 524 525Now you can view feeds in mutt(1) for example. 526 527- - - 528 529Read from mbox and filter duplicate messages using the fdm program and deliver 530it to a SMTP server. This works similar to the rss2email program. 531fdm is available at: https://github.com/nicm/fdm 532 533fdm config file (~/.sfeed/fdm.conf): 534 535 set unmatched-mail keep 536 537 account "sfeed" mbox "%[home]/.sfeed/mbox" 538 $cachepath = "%[home]/.sfeed/fdm.cache" 539 cache "${cachepath}" 540 541 # Check if message is in the cache by Message-ID. 542 match case "^Message-ID: (.*)" in headers 543 action { 544 tag "msgid" value "%1" 545 } 546 continue 547 548 # If it is in the cache, stop. 549 match matched and in-cache "${cachepath}" key "%[msgid]" 550 action { 551 keep 552 } 553 554 # Not in the cache, process it and add to cache. 555 match case "^X-Feedname: (.*)" in headers 556 action { 557 # Connect to a SMTP server and attempt to deliver the 558 # mail to it. 559 # Of course change the server and e-mail below. 560 smtp server "codemadness.org" to "hiltjo@codemadness.org" 561 562 add-to-cache "${cachepath}" key "%[msgid]" 563 keep 564 } 565 566Now run: 567 568 $ sfeed_mbox ~/.sfeed/feeds/* > ~/.sfeed/mbox 569 $ fdm -f ~/.sfeed/fdm.conf fetch 570 571Now you can view feeds in mutt(1) for example. 572 573- - - 574 575Convert mbox to separate maildirs per feed and filter duplicate messages using 576procmail(1). 577 578procmail_maildirs.sh file: 579 580 maildir="$HOME/feeds" 581 feedsdir="$HOME/.sfeed/feeds" 582 procmailconfig="$HOME/.sfeed/procmailrc" 583 584 # message-id cache to prevent duplicates. 585 mkdir -p "${maildir}/.cache" 586 587 if ! test -r "${procmailconfig}"; then 588 printf "Procmail configuration file \"%s\" does not exist or is not readable.\n" "${procmailconfig}" >&2 589 echo "See procmailrc.example for an example." >&2 590 exit 1 591 fi 592 593 find "${feedsdir}" -type f -exec printf '%s\n' {} \; | while read -r d; do 594 name=$(basename "${d}") 595 mkdir -p "${maildir}/${name}/cur" 596 mkdir -p "${maildir}/${name}/new" 597 mkdir -p "${maildir}/${name}/tmp" 598 printf 'Mailbox %s\n' "${name}" 599 sfeed_mbox "${d}" | formail -s procmail "${procmailconfig}" 600 done 601 602Procmailrc(5) file: 603 604 # Example for use with sfeed_mbox(1). 605 # The header X-Feedname is used to split into separate maildirs. It is 606 # assumed this name is sane. 607 608 MAILDIR="$HOME/feeds/" 609 610 :0 611 * ^X-Feedname: \/.* 612 { 613 FEED="$MATCH" 614 615 :0 Wh: "msgid_$FEED.lock" 616 | formail -D 1024000 ".cache/msgid_$FEED.cache" 617 618 :0 619 "$FEED"/ 620 } 621 622Now run: 623 624 $ procmail_maildirs.sh 625 626Now you can view feeds in mutt(1) for example. 627 628- - - 629 630The fetch function can be overridden in your sfeedrc file. This allows to 631replace the default curl(1) for sfeed_update with any other client to fetch the 632RSS/Atom data or change the default curl options: 633 634 # fetch a feed via HTTP/HTTPS etc. 635 # fetch(name, url, feedfile) 636 fetch() { 637 hurl -m 1048576 -t 15 "$2" 2>/dev/null 638 } 639 640- - - 641 642Caching, incremental data updates and bandwidth-saving 643 644For servers that support it some incremental updates and bandwidth-saving can 645be done by using the "ETag" HTTP header. 646 647Create a directory for storing the ETags per feed: 648 649 mkdir -p ~/.sfeed/etags/ 650 651The curl ETag options (--etag-save and --etag-compare) can be used to store and 652send the previous ETag header value. curl version 7.73+ is recommended for it 653to work properly. 654 655The curl -z option can be used to send the modification date of a local file as 656a HTTP "If-Modified-Since" request header. The server can then respond if the 657data is modified or not or respond with only the incremental data. 658 659The curl --compressed option can be used to indicate the client supports 660decompression. Because RSS/Atom feeds are textual XML content this generally 661compresses very well. 662 663These options can be set by overriding the fetch() function in the sfeedrc 664file: 665 666 # fetch(name, url, feedfile) 667 fetch() { 668 etag="$HOME/.sfeed/etags/$(basename "$3")" 669 curl \ 670 -L --max-redirs 0 -H "User-Agent:" -f -s -m 15 \ 671 --compressed \ 672 --etag-save "${etag}" --etag-compare "${etag}" \ 673 -z "${etag}" \ 674 "$2" 2>/dev/null 675 } 676 677These options can come at a cost of some privacy, because it exposes 678additional metadata from the previous request. 679 680- - - 681 682CDNs blocking requests due to a missing HTTP User-Agent request header 683 684sfeed_update will not send the "User-Agent" header by default for privacy 685reasons. Some CDNs like Cloudflare or websites like Reddit.com don't like this 686and will block such HTTP requests. 687 688A custom User-Agent can be set by using the curl -H option, like so: 689 690 curl -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0' 691 692The above example string pretends to be a Windows 10 (x86-64) machine running 693Firefox 78. 694 695- - - 696 697Page redirects 698 699For security and efficiency reasons by default redirects are not allowed and 700are treated as an error. 701 702For example to prevent hijacking an unencrypted http:// to https:// redirect or 703to not add time of an unnecessary page redirect each time. It is encouraged to 704use the final redirected URL in the sfeedrc config file. 705 706If you want to ignore this advise you can override the fetch() function in the 707sfeedrc file and change the curl options "-L --max-redirs 0". 708 709- - - 710 711Shellscript to handle URLs and enclosures in parallel using xargs -P. 712 713This can be used to download and process URLs for downloading podcasts, 714webcomics, download and convert webpages, mirror videos, etc. It uses a 715plain-text cache file for remembering processed URLs. The match patterns are 716defined in the shellscript fetch() function and in the awk script and can be 717modified to handle items differently depending on their context. 718 719The arguments for the script are files in the sfeed(5) format. If no file 720arguments are specified then the data is read from stdin. 721 722 #!/bin/sh 723 # sfeed_download: downloader for URLs and enclosures in sfeed(5) files. 724 # Dependencies: awk, curl, flock, xargs (-P), yt-dlp. 725 726 cachefile="${SFEED_CACHEFILE:-$HOME/.sfeed/downloaded_urls}" 727 jobs="${SFEED_JOBS:-4}" 728 lockfile="${HOME}/.sfeed/sfeed_download.lock" 729 730 # log(feedname, s, status) 731 log() { 732 if [ "$1" != "-" ]; then 733 s="[$1] $2" 734 else 735 s="$2" 736 fi 737 printf '[%s]: %s: %s\n' "$(date +'%H:%M:%S')" "${s}" "$3" 738 } 739 740 # fetch(url, feedname) 741 fetch() { 742 case "$1" in 743 *youtube.com*) 744 yt-dlp "$1";; 745 *.flac|*.ogg|*.m3u|*.m3u8|*.m4a|*.mkv|*.mp3|*.mp4|*.wav|*.webm) 746 # allow 2 redirects, hide User-Agent, connect timeout is 15 seconds. 747 curl -O -L --max-redirs 2 -H "User-Agent:" -f -s --connect-timeout 15 "$1";; 748 esac 749 } 750 751 # downloader(url, title, feedname) 752 downloader() { 753 url="$1" 754 title="$2" 755 feedname="${3##*/}" 756 757 msg="${title}: ${url}" 758 759 # download directory. 760 if [ "${feedname}" != "-" ]; then 761 mkdir -p "${feedname}" 762 if ! cd "${feedname}"; then 763 log "${feedname}" "${msg}: ${feedname}" "DIR FAIL" >&2 764 return 1 765 fi 766 fi 767 768 log "${feedname}" "${msg}" "START" 769 if fetch "${url}" "${feedname}"; then 770 log "${feedname}" "${msg}" "OK" 771 772 # append it safely in parallel to the cachefile on a 773 # successful download. 774 (flock 9 || exit 1 775 printf '%s\n' "${url}" >> "${cachefile}" 776 ) 9>"${lockfile}" 777 else 778 log "${feedname}" "${msg}" "FAIL" >&2 779 return 1 780 fi 781 return 0 782 } 783 784 if [ "${SFEED_DOWNLOAD_CHILD}" = "1" ]; then 785 # Downloader helper for parallel downloading. 786 # Receives arguments: $1 = URL, $2 = title, $3 = feed filename or "-". 787 # It should write the URI to the cachefile if it is successful. 788 downloader "$1" "$2" "$3" 789 exit $? 790 fi 791 792 # ...else parent mode: 793 794 tmp="$(mktemp)" || exit 1 795 trap "rm -f ${tmp}" EXIT 796 797 [ -f "${cachefile}" ] || touch "${cachefile}" 798 cat "${cachefile}" > "${tmp}" 799 echo >> "${tmp}" # force it to have one line for awk. 800 801 LC_ALL=C awk -F '\t' ' 802 # fast prefilter what to download or not. 803 function filter(url, field, feedname) { 804 u = tolower(url); 805 return (match(u, "youtube\\.com") || 806 match(u, "\\.(flac|ogg|m3u|m3u8|m4a|mkv|mp3|mp4|wav|webm)$")); 807 } 808 function download(url, field, title, filename) { 809 if (!length(url) || urls[url] || !filter(url, field, filename)) 810 return; 811 # NUL-separated for xargs -0. 812 printf("%s%c%s%c%s%c", url, 0, title, 0, filename, 0); 813 urls[url] = 1; # print once 814 } 815 { 816 FILENR += (FNR == 1); 817 } 818 # lookup table from cachefile which contains downloaded URLs. 819 FILENR == 1 { 820 urls[$0] = 1; 821 } 822 # feed file(s). 823 FILENR != 1 { 824 download($3, 3, $2, FILENAME); # link 825 download($8, 8, $2, FILENAME); # enclosure 826 } 827 ' "${tmp}" "${@:--}" | \ 828 SFEED_DOWNLOAD_CHILD="1" xargs -r -0 -L 3 -P "${jobs}" "$(readlink -f "$0")" 829 830- - - 831 832Shellscript to export existing newsboat cached items from sqlite3 to the sfeed 833TSV format. 834 835 #!/bin/sh 836 # Export newsbeuter/newsboat cached items from sqlite3 to the sfeed TSV format. 837 # The data is split per file per feed with the name of the newsboat title/url. 838 # It writes the URLs of the read items line by line to a "urls" file. 839 # 840 # Dependencies: sqlite3, awk. 841 # 842 # Usage: create some directory to store the feeds then run this script. 843 844 # newsboat cache.db file. 845 cachefile="$HOME/.newsboat/cache.db" 846 test -n "$1" && cachefile="$1" 847 848 # dump data. 849 # .mode ascii: Columns/rows delimited by 0x1F and 0x1E 850 # get the first fields in the order of the sfeed(5) format. 851 sqlite3 "$cachefile" <<!EOF | 852 .headers off 853 .mode ascii 854 .output 855 SELECT 856 i.pubDate, i.title, i.url, i.content, i.content_mime_type, 857 i.guid, i.author, i.enclosure_url, 858 f.rssurl AS rssurl, f.title AS feedtitle, i.unread 859 -- i.id, i.enclosure_type, i.enqueued, i.flags, i.deleted, i.base 860 FROM rss_feed f 861 INNER JOIN rss_item i ON i.feedurl = f.rssurl 862 ORDER BY 863 i.feedurl ASC, i.pubDate DESC; 864 .quit 865 !EOF 866 # convert to sfeed(5) TSV format. 867 LC_ALL=C awk ' 868 BEGIN { 869 FS = "\x1f"; 870 RS = "\x1e"; 871 } 872 # normal non-content fields. 873 function field(s) { 874 gsub("^[[:space:]]*", "", s); 875 gsub("[[:space:]]*$", "", s); 876 gsub("[[:space:]]", " ", s); 877 gsub("[[:cntrl:]]", "", s); 878 return s; 879 } 880 # content field. 881 function content(s) { 882 gsub("^[[:space:]]*", "", s); 883 gsub("[[:space:]]*$", "", s); 884 # escape chars in content field. 885 gsub("\\\\", "\\\\", s); 886 gsub("\n", "\\n", s); 887 gsub("\t", "\\t", s); 888 return s; 889 } 890 function feedname(feedurl, feedtitle) { 891 if (feedtitle == "") { 892 gsub("/", "_", feedurl); 893 return feedurl; 894 } 895 gsub("/", "_", feedtitle); 896 return feedtitle; 897 } 898 { 899 fname = feedname($9, $10); 900 if (!feed[fname]++) { 901 print "Writing file: \"" fname "\" (title: " $10 ", url: " $9 ")" > "/dev/stderr"; 902 } 903 904 contenttype = field($5); 905 if (contenttype == "") 906 contenttype = "html"; 907 else if (index(contenttype, "/html") || index(contenttype, "/xhtml")) 908 contenttype = "html"; 909 else 910 contenttype = "plain"; 911 912 print $1 "\t" field($2) "\t" field($3) "\t" content($4) "\t" \ 913 contenttype "\t" field($6) "\t" field($7) "\t" field($8) "\t" \ 914 > fname; 915 916 # write URLs of the read items to a file line by line. 917 if ($11 == "0") { 918 print $3 > "urls"; 919 } 920 }' 921 922- - - 923 924Progress indicator 925------------------ 926 927The below sfeed_update wrapper script counts the amount of feeds in a sfeedrc 928config. It then calls sfeed_update and pipes the output lines to a function 929that counts the current progress. It writes the total progress to stderr. 930Alternative: pv -l -s totallines 931 932 #!/bin/sh 933 # Progress indicator script. 934 935 # Pass lines as input to stdin and write progress status to stderr. 936 # progress(totallines) 937 progress() { 938 total="$(($1 + 0))" # must be a number, no divide by zero. 939 test "${total}" -le 0 -o "$1" != "${total}" && return 940 LC_ALL=C awk -v "total=${total}" ' 941 { 942 counter++; 943 percent = (counter * 100) / total; 944 printf("\033[K") > "/dev/stderr"; # clear EOL 945 print $0; 946 printf("[%s/%s] %.0f%%\r", counter, total, percent) > "/dev/stderr"; 947 fflush(); # flush all buffers per line. 948 } 949 END { 950 printf("\033[K") > "/dev/stderr"; 951 }' 952 } 953 954 # Counts the feeds from the sfeedrc config. 955 countfeeds() { 956 count=0 957 . "$1" 958 feed() { 959 count=$((count + 1)) 960 } 961 feeds 962 echo "${count}" 963 } 964 965 config="${1:-$HOME/.sfeed/sfeedrc}" 966 total=$(countfeeds "${config}") 967 sfeed_update "${config}" 2>&1 | progress "${total}" 968 969- - - 970 971Counting unread and total items 972------------------------------- 973 974It can be useful to show the counts of unread items, for example in a 975windowmanager or statusbar. 976 977The below example script counts the items of the last day in the same way the 978formatting tools do: 979 980 #!/bin/sh 981 # Count the new items of the last day. 982 LC_ALL=C awk -F '\t' -v "old=$(($(date +'%s') - 86400))" ' 983 { 984 total++; 985 } 986 int($1) >= old { 987 totalnew++; 988 } 989 END { 990 print "New: " totalnew; 991 print "Total: " total; 992 }' ~/.sfeed/feeds/* 993 994The below example script counts the unread items using the sfeed_curses URL 995file: 996 997 #!/bin/sh 998 # Count the unread and total items from feeds using the URL file. 999 LC_ALL=C awk -F '\t' ' 1000 # URL file: amount of fields is 1. 1001 NF == 1 { 1002 u[$0] = 1; # lookup table of URLs. 1003 next; 1004 } 1005 # feed file: check by URL or id. 1006 { 1007 total++; 1008 if (length($3)) { 1009 if (u[$3]) 1010 read++; 1011 } else if (length($6)) { 1012 if (u[$6]) 1013 read++; 1014 } 1015 } 1016 END { 1017 print "Unread: " (total - read); 1018 print "Total: " total; 1019 }' ~/.sfeed/urls ~/.sfeed/feeds/* 1020 1021- - - 1022 1023sfeed.c: adding new XML tags or sfeed(5) fields to the parser 1024------------------------------------------------------------- 1025 1026sfeed.c contains definitions to parse XML tags and map them to sfeed(5) TSV 1027fields. Parsed RSS and Atom tag names are first stored as a TagId, which is a 1028number. This TagId is then mapped to the output field index. 1029 1030Steps to modify the code: 1031 1032* Add a new TagId enum for the tag. 1033 1034* (optional) Add a new FeedField* enum for the new output field or you can map 1035 it to an existing field. 1036 1037* Add the new XML tag name to the array variable of parsed RSS or Atom 1038 tags: rsstags[] or atomtags[]. 1039 1040 These must be defined in alphabetical order, because a binary search is used 1041 which uses the strcasecmp() function. 1042 1043* Add the parsed TagId to the output field in the array variable fieldmap[]. 1044 1045 When another tag is also mapped to the same output field then the tag with 1046 the highest TagId number value overrides the mapped field: the order is from 1047 least important to high. 1048 1049* If this defined tag is just using the inner data of the XML tag, then this 1050 definition is enough. If it for example has to parse a certain attribute you 1051 have to add a check for the TagId to the xmlattr() callback function. 1052 1053* (optional) Print the new field in the printfields() function. 1054 1055Below is a patch example to add the MRSS "media:content" tag as a new field: 1056 1057diff --git a/sfeed.c b/sfeed.c 1058--- a/sfeed.c 1059+++ b/sfeed.c 1060@@ -50,7 +50,7 @@ enum TagId { 1061 RSSTagGuidPermalinkTrue, 1062 /* must be defined after GUID, because it can be a link (isPermaLink) */ 1063 RSSTagLink, 1064- RSSTagEnclosure, 1065+ RSSTagMediaContent, RSSTagEnclosure, 1066 RSSTagAuthor, RSSTagDccreator, 1067 RSSTagCategory, 1068 /* Atom */ 1069@@ -81,7 +81,7 @@ typedef struct field { 1070 enum { 1071 FeedFieldTime = 0, FeedFieldTitle, FeedFieldLink, FeedFieldContent, 1072 FeedFieldId, FeedFieldAuthor, FeedFieldEnclosure, FeedFieldCategory, 1073- FeedFieldLast 1074+ FeedFieldMediaContent, FeedFieldLast 1075 }; 1076 1077 typedef struct feedcontext { 1078@@ -137,6 +137,7 @@ static const FeedTag rsstags[] = { 1079 { STRP("enclosure"), RSSTagEnclosure }, 1080 { STRP("guid"), RSSTagGuid }, 1081 { STRP("link"), RSSTagLink }, 1082+ { STRP("media:content"), RSSTagMediaContent }, 1083 { STRP("media:description"), RSSTagMediaDescription }, 1084 { STRP("pubdate"), RSSTagPubdate }, 1085 { STRP("title"), RSSTagTitle } 1086@@ -180,6 +181,7 @@ static const int fieldmap[TagLast] = { 1087 [RSSTagGuidPermalinkFalse] = FeedFieldId, 1088 [RSSTagGuidPermalinkTrue] = FeedFieldId, /* special-case: both a link and an id */ 1089 [RSSTagLink] = FeedFieldLink, 1090+ [RSSTagMediaContent] = FeedFieldMediaContent, 1091 [RSSTagEnclosure] = FeedFieldEnclosure, 1092 [RSSTagAuthor] = FeedFieldAuthor, 1093 [RSSTagDccreator] = FeedFieldAuthor, 1094@@ -677,6 +679,8 @@ printfields(void) 1095 string_print_uri(&ctx.fields[FeedFieldEnclosure].str); 1096 putchar(FieldSeparator); 1097 string_print_trimmed_multi(&ctx.fields[FeedFieldCategory].str); 1098+ putchar(FieldSeparator); 1099+ string_print_trimmed(&ctx.fields[FeedFieldMediaContent].str); 1100 putchar('\n'); 1101 1102 if (ferror(stdout)) /* check for errors but do not flush */ 1103@@ -718,7 +722,7 @@ xmlattr(XMLParser *p, const char *t, size_t tl, const char *n, size_t nl, 1104 } 1105 1106 if (ctx.feedtype == FeedTypeRSS) { 1107- if (ctx.tag.id == RSSTagEnclosure && 1108+ if ((ctx.tag.id == RSSTagEnclosure || ctx.tag.id == RSSTagMediaContent) && 1109 isattr(n, nl, STRP("url"))) { 1110 string_append(&tmpstr, v, vl); 1111 } else if (ctx.tag.id == RSSTagGuid && 1112 1113- - - 1114 1115Running custom commands inside the sfeed_curses program 1116------------------------------------------------------- 1117 1118Running commands inside the sfeed_curses program can be useful for example to 1119sync items or mark all items across all feeds as read. It can be comfortable to 1120have a keybind for this inside the program to perform a scripted action and 1121then reload the feeds by sending the signal SIGHUP. 1122 1123In the input handling code you can then add a case: 1124 1125 case 'M': 1126 forkexec((char *[]) { "markallread.sh", NULL }, 0); 1127 break; 1128 1129or 1130 1131 case 'S': 1132 forkexec((char *[]) { "syncnews.sh", NULL }, 1); 1133 break; 1134 1135The specified script should be in $PATH or be an absolute path. 1136 1137Example of a `markallread.sh` shellscript to mark all URLs as read: 1138 1139 #!/bin/sh 1140 # mark all items/URLs as read. 1141 tmp="$(mktemp)" || exit 1 1142 (cat ~/.sfeed/urls; cut -f 3 ~/.sfeed/feeds/*) | \ 1143 awk '!x[$0]++' > "$tmp" && 1144 mv "$tmp" ~/.sfeed/urls && 1145 pkill -SIGHUP sfeed_curses # reload feeds. 1146 1147Example of a `syncnews.sh` shellscript to update the feeds and reload them: 1148 1149 #!/bin/sh 1150 sfeed_update 1151 pkill -SIGHUP sfeed_curses 1152 1153 1154Running programs in a new session 1155--------------------------------- 1156 1157By default processes are spawned in the same session and process group as 1158sfeed_curses. When sfeed_curses is closed this can also close the spawned 1159process in some cases. 1160 1161When the setsid command-line program is available the following wrapper command 1162can be used to run the program in a new session, for a plumb program: 1163 1164 setsid -f xdg-open "$@" 1165 1166Alternatively the code can be changed to call setsid() before execvp(). 1167 1168 1169Open an URL directly in the same terminal 1170----------------------------------------- 1171 1172To open an URL directly in the same terminal using the text-mode lynx browser: 1173 1174 SFEED_PLUMBER=lynx SFEED_PLUMBER_INTERACTIVE=1 sfeed_curses ~/.sfeed/feeds/* 1175 1176 1177Yank to tmux buffer 1178------------------- 1179 1180This changes the yank command to set the tmux buffer, instead of X11 xclip: 1181 1182 SFEED_YANKER="tmux set-buffer \`cat\`" 1183 1184 1185Known terminal issues 1186--------------------- 1187 1188Below lists some bugs or missing features in terminals that are found while 1189testing sfeed_curses. Some of them might be fixed already upstream: 1190 1191- cygwin + mintty: the xterm mouse-encoding of the mouse position is broken for 1192 scrolling. 1193- HaikuOS terminal: the xterm mouse-encoding of the mouse button number of the 1194 middle-button, right-button is incorrect / reversed. 1195- putty: the full reset attribute (ESC c, typically `rs1`) does not reset the 1196 window title. 1197- Mouse button encoding for extended buttons (like side-buttons) in some 1198 terminals are unsupported or map to the same button: for example side-buttons 7 1199 and 8 map to the scroll buttons 4 and 5 in urxvt. 1200 1201 1202License 1203------- 1204 1205ISC, see LICENSE file. 1206 1207 1208Author 1209------ 1210 1211Hiltjo Posthuma <hiltjo@codemadness.org>