Feed detection for Tiny Tiny RSS
Until recently I used newsbeuter, a neat tool for the command line, to read news feeds. It is fast, works on cli and lets you really easily navigate around with keyboard shortcuts.
It has two downsides: First, startup takes too long with an archive of ~20.000 feed entries. The second issue is more serious; reading news at work is not really possible, since newsbeuter is installed on my laptop. I could install it on my server, but then opening the articles with a key press would not work anymore.
So after pondering a bit, I decided to install Tiny Tiny RSS on my server. It is a web-based feed reader that promised keyboard navigation, multi user support, categorization and a bunch of other features.
It turned out that it was as promised, even if the installation was a bit hard - as example, TT-RSS uses /icons/ as feed icon directory, which is used by apache for its icons, too. Also, a manual is missing and there are a really many configuration options.
Feed subscription woes
After importing my OPML file, I began to add new feeds manually. I did this by copying the URL of the website I was visiting and pasting it into the "add feed" URL field - or typing it myself. (The settings contain a "add to Firefox" setting which actually registers TT-RSS as feed reader, but I still use Opera and was out of luck.)
Unfortunately, TT-RSS did not recognize that the URL I inserted was a html file, nor did it automatically prepend http:// if I typed the URL manually and didn't add it myself (which I never do since I'm inherently lazy). I opened a bug report and got the not so encouraging response that I should fix that myself since the author was not willing to support
terminally lazy people
Feed subscription enhancements
Since I really really need this feature and did not want to use another tool, I chose to implement it myself. After getting an overview on the code, I met Andrew Dolgov, the author, on IRC and talked with him how the feature was best implemented - how to determine html pages, how to extract the feed urls, how to do that in the frontend. I was pretty close to his ideas and started working on it.
Now, after three days of hacking on the train, the feature is finally implemented and merged into Andrew's master branch!
It works as follows:
- User types an URL to subscribe to
- URL gets completed (http:// prepended) if incomplete
- Contents of the URL are fetched
- If the content is not html, Tiny Tiny RSS directly subscribes to it
- Otherwise, all rel="alternate" links are extracted from the page
- If only one link is found, TT-RSS subscribes to it without asking a question
- If several links are found, the user gets a list with them and may choose which to subscribe to
Extracting the feed URLs is made with DOMDocument::loadhtml() and a bit of xpath. The only hard thing was to get the link URLs correct - feed links can be absolute URLs, absolute paths or relative paths.
Here is a screenshot of TT-RSS feed subscription in action: