Counting RSS Subscribers
How many people read RSS4Lib? I’ve asked this question before, but I keep coming back to it. Each time I do, I realize that the answer is even less straightforward than I previously thought.
Looking at my server log files, I think it’s clear that the vast majority of hits come from the RSS and ATOM feeds — they account for a whopping 43,255 requests, or 68.9% of files delivered from www.rss4lib.com, for April 2007. This number — as impressive as it sounds to me — does not really mean much. Feed readers and aggregators, by their very nature, check the feed frequently (many times a day) to see if the feed has been updated so that the application can tell the user there’s something new to read. A well-behaved feed reader or aggregator will only download the feed again if it has changed. But it is still pinging the server regularly to see if anything has changed before it downloads the full feed. A very large portion — well over 50% — of these requests result in nothing being downloaded from the server.
Where Are Feeds Read?
So how can one tell how many people are actually subscribed to an RSS feed? There are at least four significant sources of subscribers to an RSS feed:
1) Web-based aggregators like Bloglines and Google Reader can download the same feed many times a day to make sure nothing is new and then reproduce it for multiple subscribers.
2) PC-based feed aggregators (for example, Feed Demon or Radio UserLand). Like Web-based aggregators, these applications also check the feed periodically, but do so for each user individually. Ten users running these applications would result in ten downloads of an RSS feed per time period (every hour, every day, twice a day — depending on how the user has configured his own application).
3) Browser-based “live bookmarks” (for example, Firefox, Internet Explorer 7, and Safari). Newer web browsers allow a user to bookmark an RSS feed and display, variously, headlines or the full RSS feed as items are updated. Like other aggregators, they check the RSS file periodically for updates.
4) Web applications (such as Feed2JS or RSS2HTML) that plug RSS feeds into web pages that are, in turn, read by one or more people.
So as I looked through my server log files, I began to get increasingly curious about what, if anything, I could tease out of the data that are there. There are, of course, commercial products, like FeedBurner or GetClicky, that provide nice reports, if you have them provide your feeds. I’ve opted to do things myself, though. So I got adventurous and started writing some Perl code to parse the log files and make a best-guess effort to estimate the number of subscribers to RSS4Lib’s feeds.
Subscribers will be off for several reasons, the most significant one being that a single person might be subscribed to the same feed in several places. I’m not overstating my self-importance; how many of us ever delete our subscriptions at one service when we move to a new one? Not I. When I try a new aggregator, I’m likely to download my entire subscription from my current favorite aggregator as an OPML file and upload it into the new aggregator. Whichever ends up being my current favorite I use; the has-been just sits there, but I’m still subscribed to things. And unless I’ve made a live bookmark in Firefox in my toolbar, I may not notice that a feed I subscribed to has been merrily updating itself for weeks.
On the other hand, in many cases it’s very difficult to determine if a given user-agent is a web browser or PC-based aggregator (with, presumably, one subscriber) or an aggregator or web application with many subscribers. Most user-agents — how a web browser or application identifies itself to the web server — simply give their name, their platform (Mac, Windows, etc.), and what kind of browser they’re most like (Mozilla, Gecko, etc.). A very few — and fortunately for my purposes, the most popular aggregators are included in this elite few — actually include the number of readers subscribed to the feed in the user-agent statement (for example, Bloglines tells the server that it’s “Bloglines/3.1 (http://www.bloglines.com; 602 subscribers)”. It gives a name, a URL for more information, and the number of subscribers. Other user-agents are less informative; an example of this type is “FeedOnFeeds/0.1.9 (+http://minutillo.com/steve/feedonfeeds/)”. And still others are downright terse: “Particls/1.0″.
How to count these various types? It depends. The good guys (Bloglines, Google, Yahoo, and a few others) make life easy; it’s simply a matter of looking at the log file and pulling out the number of subscribers. There’s not much to be done with the “bad” web aggregators, the ones that do not provide any user subscriber data, except to count them as one aggregator, one user.
For PC- and browser-based readers, it’s possible to make some good guesses. The log file includes the IP address of the computer requesting the RSS file. By combining the user-agent and the IP address and counting the unique pairs, it’s possible to come up with a good guess at the number of unique users who are receiving the RSS file through this channel. Since each user-agent/IP address pair likely downloads the file multiple times in a day, the number of user-agent/IP address pairs stands as a proxy for subscribers.
And then there are web-based applications like Feed2JS that simply convert an RSS feed into HTML and place it on a web page. The user-agent does not know how many people read the web page and does not provide the URL of the page where the feed will appear. (I don’t think this latter point is even feasible; thanks to caching of RSS feeds on the web server where the content is being reused, it’s impossible to know at the time the feed is downloaded from here where it will end up.) This sort of activity gets boiled down to the number of unique user-agent/IP address pairs, which is almost certainly lower than the number of people who see the web page the feed is on.
When all of these things are processed and added together, I discovered that there might have been 1096 subscribers on April 30, 2007. “Might have been?” Yes — this number is a guess and is almost certainly not the real number of subscribers, let alone the real number of readers. Readers could be lower (people who subscribe in more than one place but read in one; readers who simply have not bothered to delete the subscription in a no-longer-favored aggregator) or higher (people who read the feed through web pages or through syndicators that use aggregators as the source of their information and then redistribute it elsewhere — via tools such as ZapTXT and Feed2JS). But this number will have to do for now.
One thing is clear to me from this exploration of my server’s log files: RSS allows my content to go places I never thought possible and to be read by an audience far broader than I would have guessed reasonable. That’s something to keep in mind when you’re writing for your library’s site — or for yourself.
Please feel free to experiment with the application I wrote — it’s at http://www.rss4lib.com/feedstats/ — and let me know what you think.
Entry filed under: RSS Tools. Tags: .