May 2007 – RSS4Lib

Academic Uses of RSS

The RSS Specifications blog has an article listing 15 uses for RSS in an educational environment. While many of these items fit within a broader academic context, several could be easily ported to the library. For example, a library could use RSS to publish a study guide. It might be general (i.e., one word aimed at a certain reading level each day) or specific to a user group (daily historical facts for a specific K-12 or higher ed course).

How is your library using RSS to communicate with your community?

MyFeedz

MyFeedz is an interesting tool from Adobe Lab’s — from Adobe’s Romanian office, to be precise — that aspires to build a personalized news service based on your RSS reading habits.

MyFeedz has two ways to learn about your interests. One is by simply watching you read the titles and first paragraphs of text and noting when you click through to read the full text. The other involves jump-starting the system (at your choice) by uploading your RSS reading list via an OPML file (OPML files can be output from most popular RSS aggregators, including Bloglines and Google Reader). The process behind the scenes is not quick — I suggest getting a cup of coffee, walking the dog, or reading your day’s RSS feeds while MyFeedz works.

When MyFeedz is done doing its magic — described somewhat cryptically on the site as a process involving an analysis of “its source, tags, popularity, rating, language and more” — it has built a profile of the subjects that you generally prefer to read. It then goes out and looks for more blog entries that match this general profile.

Here is where I’ll speculate a bit on what is going on behind the scenes. My guess is that MyFeedz is using some form of vector analysis to model the items in which you have shown an interest. Vector analysis (see this Wikipedia article for more of the math than I understand) compares texts and scores them with a probability that they are about the same thing, even if the same exact phrases do not appear in both documents.

If I am correct, that is the time-consuming part. For my aggregated list, MyFeedz took several hours to be ready to show me new articles based on the profile it generated for me. But now, it is showing me articles within the sphere of what I’ve already expressed interest in knowing about. As I write this, three of its top five recommendations for me are:

The Four Habits of Highly Effective Librarians
This one I’d seen already
Seeking Windows Consultant with COM ATL, .NET Win CE for 6+ months
A bit off-topic, but I read a lot of computer-related stuff
Why Buying Feedburner is Really Smart of Google
They are? This was news to me! See TechCrunch

The other two are less clearly germane (one about the Dodgers and I’m a member of Red Sox Nation, the other about NJ Governor Corzine’s political difficulties). I think batting .600 is pretty good for an automated recommendation engine that I just recently started using. However, as I’ve written before (see “Serendipity at Risk“), I like a bit of surprise in my reading, as long as it is tangentially related to what I’m focusing on. The items I catch out of the corner of my eye while I’m looking in one direction are often fascinating.

I will be curious to see if, as I work with MyFeedz, if it continues to narrow in on my core interests while providing me some “ah-ha!” moments. MyFeedz will not replace my aggregator — that is not its purpose — but it makes for an interesting discovery tool.

LibraryThing and the Danbury Library

This really has little to do with RSS, but it is such a useful and clever service that I can’t resist writing about it.

Tim Spalding of Library Thing today announced LibraryThing for Libraries with its first implementation, the Danbury Library (in Connecticut).
Tim explains the whys and wherefores in great detail in his post, but the upshot of it all is that when you search for a book in the Danbury library’s catalog, in addition to the catalog and holdings data, you also see:

Tags from LibraryThing’s 200,000 members and 13 million books;
Other editions and translations of the book you are looking at;
Tags entered by LibraryThing’s users describing the item you are viewing; and
Similar titles.

The last three items only show books held by Danbury’s library. And LibraryThing has restricted the tags that appear in the Danbury catalog so that tags that describe location of the book or the tagger’s intent (for example, “at the beach house” or “to read”) are not included.

RSS New Books Screen Saver

Here’s a very clever use of the Macintosh’s RSS screen saver and the library’s new book list: a new books RSS screen saver. If you have a Mac, follow this link: screen saver (there are instructions on the blog for customizing it). Windows users are out of luck, I’m afraid.

What a great idea — computers in the library could advertise the new materials. With a bit of effort, computers in the children’s area could show new children’s books, new mysteries in that section, and new biographies on the PC near that section. And, of course, patrons could use it at home, too.

Public Schools and RSS

The Colonel Mitchell Paige Middle School in La Quinta, California, is using RSS and podcasts to keep parents in touch with the day’s activities. There’s a podcast of the morning announcements. Some teachers are recording information about tests and how-to tips for students and parents. And other teachers are using RSS to let parents know about their child’s homework assignments.
I wonder how many public school libraries could help — or already are helping — their school by providing this sort of infrastructure?

[Via RSS Specifications Blog.]

Counting RSS Subscribers

UPDATE (8 June 2008)
Find out how many subscribers your RSS feed has using YourStats, an RSS4Lib tool. Upload your own blog’s server access log files and get a count of how many readers your feed has.

How many people read RSS4Lib? I’ve asked this question before, but I keep coming back to it. Each time I do, I realize that the answer is even less straightforward than I previously thought.

Looking at my server log files, I think it’s clear that the vast majority of hits come from the RSS and ATOM feeds — they account for a whopping 43,255 requests, or 68.9% of files delivered from www.rss4lib.com, for April 2007. This number — as impressive as it sounds to me — does not really mean much. Feed readers and aggregators, by their very nature, check the feed frequently (many times a day) to see if the feed has been updated so that the application can tell the user there’s something new to read. A well-behaved feed reader or aggregator will only download the feed again if it has changed. But it is still pinging the server regularly to see if anything has changed before it downloads the full feed. A very large portion — well over 50% — of these requests result in nothing being downloaded from the server.

Where Are Feeds Read?

So how can one tell how many people are actually subscribed to an RSS feed? There are at least four significant sources of subscribers to an RSS feed:

Web-based aggregators like Bloglines and Google Reader can download the same feed many times a day to make sure nothing is new and then reproduce it for multiple subscribers.
PC-based feed aggregators (for example, Feed Demon or Radio UserLand). Like Web-based aggregators, these applications also check the feed periodically, but do so for each user individually. Ten users running these applications would result in ten downloads of an RSS feed per time period (every hour, every day, twice a day — depending on how the user has configured his own application).
Browser-based “live bookmarks” (for example, Firefox, Internet Explorer 7, and Safari). Newer web browsers allow a user to bookmark an RSS feed and display, variously, headlines or the full RSS feed as items are updated. Like other aggregators, they check the RSS file periodically for updates.
Web applications (such as Feed2JS or RSS2HTML) that plug RSS feeds into web pages that are, in turn, read by one or more people.

Methodology

So as I looked through my server log files, I began to get increasingly curious about what, if anything, I could tease out of the data that are there. There are, of course, commercial products, like FeedBurner or GetClicky, that provide nice reports, if you have them provide your feeds. I’ve opted to do things myself, though. So I got adventurous and started writing some Perl code to parse the log files and make a best-guess effort to estimate the number of subscribers to RSS4Lib’s feeds.

Subscribers will be off for several reasons, the most significant one being that a single person might be subscribed to the same feed in several places. I’m not overstating my self-importance; how many of us ever delete our subscriptions at one service when we move to a new one? Not I. When I try a new aggregator, I’m likely to download my entire subscription from my current favorite aggregator as an OPML file and upload it into the new aggregator. Whichever ends up being my current favorite I use; the has-been just sits there, but I’m still subscribed to things. And unless I’ve made a live bookmark in Firefox in my toolbar, I may not notice that a feed I subscribed to has been merrily updating itself for weeks.

On the other hand, in many cases it’s very difficult to determine if a given user-agent is a web browser or PC-based aggregator (with, presumably, one subscriber) or an aggregator or web application with many subscribers. Most user-agents — how a web browser or application identifies itself to the web server — simply give their name, their platform (Mac, Windows, etc.), and what kind of browser they’re most like (Mozilla, Gecko, etc.). A very few — and fortunately for my purposes, the most popular aggregators are included in this elite few — actually include the number of readers subscribed to the feed in the user-agent statement (for example, Bloglines tells the server that it’s “Bloglines/3.1 (http://www.bloglines.com; 602 subscribers)”. It gives a name, a URL for more information, and the number of subscribers. Other user-agents are less informative; an example of this type is “FeedOnFeeds/0.1.9 (+http://minutillo.com/steve/feedonfeeds/)”. And still others are downright terse: “Particls/1.0”.

How to count these various types? It depends. The good guys (Bloglines, Google, Yahoo, and a few others) make life easy; it’s simply a matter of looking at the log file and pulling out the number of subscribers. There’s not much to be done with the “bad” web aggregators, the ones that do not provide any user subscriber data, except to count them as one aggregator, one user.

For PC- and browser-based readers, it’s possible to make some good guesses. The log file includes the IP address of the computer requesting the RSS file. By combining the user-agent and the IP address and counting the unique pairs, it’s possible to come up with a good guess at the number of unique users who are receiving the RSS file through this channel. Since each user-agent/IP address pair likely downloads the file multiple times in a day, the number of user-agent/IP address pairs stands as a proxy for subscribers.

And then there are web-based applications like Feed2JS that simply convert an RSS feed into HTML and place it on a web page. The user-agent does not know how many people read the web page and does not provide the URL of the page where the feed will appear. (I don’t think this latter point is even feasible; thanks to caching of RSS feeds on the web server where the content is being reused, it’s impossible to know at the time the feed is downloaded from here where it will end up.) This sort of activity gets boiled down to the number of unique user-agent/IP address pairs, which is almost certainly lower than the number of people who see the web page the feed is on.

Results

When all of these things are processed and added together, I discovered that there might have been 1096 subscribers on April 30, 2007. “Might have been?” Yes — this number is a guess and is almost certainly not the real number of subscribers, let alone the real number of readers. Readers could be lower (people who subscribe in more than one place but read in one; readers who simply have not bothered to delete the subscription in a no-longer-favored aggregator) or higher (people who read the feed through web pages or through syndicators that use aggregators as the source of their information and then redistribute it elsewhere — via tools such as ZapTXT and Feed2JS). But this number will have to do for now.

One thing is clear to me from this exploration of my server’s log files: RSS allows my content to go places I never thought possible and to be read by an audience far broader than I would have guessed reasonable. That’s something to keep in mind when you’re writing for your library’s site — or for yourself.

Please feel free to experiment with the application I wrote — it’s at http://www.rss4lib.com/feedstats/ — and let me know what you think.