I just noticed that the number of subscribers at Bloglines recently fell sharply. (I use RSS4Lib Feedstats tool to track my blog’s readership via RSS.) On September 27, Bloglines was reporting a total of 872 subscribers to the RSS and Atom feeds from my blog (see details). This number is consistent — Bloglines has reported a gradually growing number of readers, adding a few each week, over the past months. On September 28, there were only 565 subscribers (see details) to the two feeds, according to Bloglines.
Bloglines reports the number of users who read each feed in the server log files. For example, the Bloglines crawler passed through earlier today and left this log line:
65.214.44.28 - - [06/Oct/2008:08:12:38 -0700] "GET /index.xml HTTP/1.1" 304 - "-" "Bloglines/3.1 (http://www.bloglines.com; 527 subscribers)"
The number of Bloglines subscribers went down for both RSS and Atom (from 796 to 527 and 56 to 38, respectively). These lower numbers have stayed consistent since 9/27/2008, which makes me think it’s not just a transient error. Interestingly, the numbers reported for each feed within the Bloglines web site have not changed. (See Bloglines’ list of RSS and Atom subscribers, neither of which has been updated.) The web site’s numbers have always lagged the Bloglines crawler’s numbers by a week or more, so the discrepancy itself is probably not significant.
Has anyone else noticed that Bloglines subscriber numbers took a dive a couple weeks ago? Can anyone with a long-dormant Bloglines account confirm that it has been purged?
Category: Syndication
Putting a Creative Commons License in Your Feeds
Did you know you that it’s easy to add a creative commons license to your RSS and Atom feeds — not just to your blog’s web site? Here are brief instructions for adding your Creative Commons license to RSS and Atom feeds:
RSS 2.0
You need to make two small edits to the RSS 2.0 template your blog software uses.
- Change the line that reads <rss version = "2.0"> to <rss version="2.0" xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule">. This is probably the second line in the RSS file. The addition of the “xmlns…” bit sets up the second item you’ll edit, pointing to the web page that defines an extension to the standard RSS 2.0 field set.
- Then add the URL to the Creative Commons license you’ve selected at the Creative Commons web site. This bit goes anywhere between the <channel> and the </channel> tags. For example, RSS4Lib has an “attribution non-commercial” license, version 3.0. So I’ve added this code to my RSS feed: <creativeCommons:license>http://creativecommons.org/licenses/by-nc/3.0/us/</creativeCommons:license>
Atom
It is even easier to add a Creative Commons license to an Atom feed. There’s just one line to add to the Atom template. For RSS4Lib, this is: <link rel="license" type="application/rdf+xml" href="http://creativecommons.org/licenses/by-nc/3.0/us/rdf" />. Again, this assumes you’ve picked an “attribution non-commercial” license. Whatever the Creative Commons license URL, add “rdf” to the end. And that’s it.
So What?
I suppose that putting the license on the web site alone is enough, from a strict legal standpoint. However, as we all know, RSS feeds have a habit of wandering off almost on their own power. Adding the license to the feeds themselves gives you an extra bit of protection — the consumer of the feed cannot say they were unaware the content was licensed.
Creative Commons and Blogging
Copyright and RSS frequently appear to be ill-suited bedfellows. On one side we have the author’s desire to have one’s content distributed as widely as possible. On the other, we have the publisher’s desire to control the way one’s content is used — out of the concern for losing control over one’s work, perceived or real financial loss, or simple desire to be properly attributed. Where in traditional media, publisher and author are usually different (and the most common place those two roles intersected was the vanity press), in “new media,” the same person frequently takes on both roles.
Copyright is often seen as complicated, and for good reason. In the United States, anyway, a work is copyrighted at the moment it is created and may not be reproduced with explicit permission. (The legal concept of “Fair use,” in the United States, is at best murky. It’s a right that does not readily extend to other legal domains. And, it almost certainly does not apply to the wholesale reproduction of items from an RSS feed. But I’m no lawyer.) At the other extreme, the author can explicitly waive copyright — a choice that few authors or publishers would opt for. In the middle ground is licensing the use of content for various uses. This is the sensible middle ground, for most bloggers: some uses of my content are fine while others are not.
However, the challenge arises in setting the language of that license and defining the kinds of use to allow. Doing so in a legally defensible way is complicated (again, I’m no lawyer). So what should the blogger to do? Use Creative Commons. Creative Commons (CC) is a non-profit foundation that has written legally valid and clearly understandable licenses that anyone may use. By applying one of CC’s licenses to blog content, the blogger can state clearly what uses of that content are allowed. Can it be reused wholesale? Reused only if the person using it does not make any money from it? Reused only if attribution is given and no changes are made to the original? There are many permutations. (Unfortunately, there’s no standard way to license content in an RSS feed.)
RSS4Lib is now licensed under an Attribution-Noncommercial 3.0 United States license. (Look toward the bottom of the sidebar.)
If you want to learn more about using content — including RSS4Lib — that has a Creative Commons license, I highly recommend CC HowTo #1: How to Attribute a Creative Commons licensed work at my colleague Molly Kleinman’s blog. The first in her planned series of posts is excellent, and I look forward to future installments.
RSS: The Shipping Container of the Internet
Not too long ago, I read a fascinating book about international shipping. No, I’m serious: Marc Levinson’s The Box: How the Shipping Container Made the World Smaller and the World Economy Bigger, published in 2006 (which happens to have been the 50th anniversary of that ubiquitous part of the global economy, the shipping container).
In a nutshell, the standardized shipping container revolutionized international trade by vastly speeding up the loading and unloading of ships. The cargo that had been brought to the wharf, unloaded from a truck into a pile on the dock, moved piece by piece into cargo netting to be hoisted by crane into the hold of a ship, so that it could be removed from the cargo net and then shoved in the corner of a hold, was now as complicated as building a stack of bricks. OK, a bit more complicated, since loading and unloading containers is really an art, the ship needs to be properly balanced, and so forth — but basically, a crane operator and few others can load a ship. Turnaround times at pier — when large, expensive, freighters were just sitting there — were reduced dramatically.
What does this have to do with RSS? Quite a bit, actually. RSS is the box into which any old thing can be packed, for uniform shipping from producer to consumer. A paragraph of text, an audio podcast, a video podcast, a Word document… If it can be put online, it can be shoved into a container (the RSS item), given a bill of lading (the RSS channel), pre-cleared for customs (tags, authors, keywords, etc.), and sent on its merry way on a conveyance (the RSS feed). Nobody has to touch the contents between shipper and receiver — just once to pack it, once to unpack it.
The feed is empty…. Fill it!
Addendum (10AM 5 August 08): Another similarity pointed out to me (thanks Cindi) is that RSS and shipping containers both lack security and authentication. The ramifications of this problem are a bit more serious for shipping containers than for feeds. Still, not really knowing who might have mucked with a feed between origin and destination, or having any real knowledge of who published it in the first place once the feed items are scattered around the Internet, can be a problem. Feeds, once set free, can have a life of their own.
New Pew Survey on Blogging and Blog Readership
The Pew Internet & American Life Project released a summary of a spring survey on bloggers and blog readers: New Numbers for Blogging and Blog Readership.
Although the full report is not presented, some summary information is. These points are of note in the report’s discussion about blog readership:
- “33% of internet users (the equivalent of 24% of all adults) say they read blogs, with 11% of internet users doing so on a typical day.”
- “42% of internet users (representing 32% of all adults)” say they have, at some time, read a blog or online journal.
- Men and women in this study are equally likely to say that they currently read other people’s blogs (35% for men, 32% for women)
- Men are more likely than women to say that they have read other people’s blogs at some point in the past (48% vs. 38%). Pew speculates that this difference is because men “are generally heavily represented among the early adopters for most technologies, but women catch up over time.”
Items of note in the discussion about blog authorship:
- “12% of internet users (representing 9% of all adults) say they ever create or work on their own online journal or blog.”
- “For a majority of bloggers, working on their blog is not an every-day activity: 5% of internet users blog on a typical day.”
If a quarter of all adults say they read blogs on a daily basis, I wonder what additional percentage read blogs without knowing it? I also wonder what percentage of the currently active blog-reading population does so via RSS, and if they realize they’re reading a blog when they go to Google Reader or Bloglines.
AP, Bloggers, and Fair Use
The Associated Press has stepped back from its original position on copyright and the blogosphere and will be developing a (hopefully) more nuanced policy. According to an article in the June 16 issue of The New York Times, “The Associated Press … said that it will, for the first time, attempt to define clear standards as to how much of its articles and broadcasts bloggers and Web sites can excerpt without infringing on The A.P.’s copyright.”
The recent controversy arose when AP requested that the Drudge Retort (a left-leaning response to Matt Drudge’s conservative Drudge Report) remove seven portions of its syndicated news stories from its web site. (I should note that the excerpts varied in length from 39 to 79 words; the excerpt I have in the previous paragraph is a hopefully safe 37 words.)
The AP’s move to better define fair use when it comes to blogging about news is a welcome one. As I discussed last month, there is a vast gap between what publishers desire and what common practice defines in the realm of copyright. The doctrine of “Fair Use” is “unclear and not easily defined.” (This according to the U.S. Copyright office itself!) Fair use is usually decided in the courts, after the fact. Bloggers have taken their stand through their actions — for better or worse, a significant portion of bloggers view fair use liberally. Publishers have, as fits their economic interest, taken a more restrictive view. It is refreshing to see a major publisher declare its interest in finding a middle ground that it can endorse.
RSS Feeds & Copyright
Copyright and fair use are poorly understood in the population at large (just ask high school teachers or college professors how much time they spend vetting submitted papers for flagrant — let alone subtle — plagiarism). However, syndication technologies such as RSS and Atom make it so easy to repurpose works that what’s proper — morally or legally — is often overlooked. After all, feeds are purpose built to make content portable. If the author did not want others to copy the content, the author would not send it out in a format designed for its simple syndication.
The Australian magazine PC World runs an interesting article by Larry Borsato: “Who owns ‘public’ content? RSS feed ownership brought into question.” In the article, Borsato recounts a recent incident in which a commercial entity reproduced, in toto, his blog posts via RSS on its web site. While Borsato has a Creative Commons non-commercial attribution license, he felt the commercial entity had violated it; they were, after all, a commercial entity. While the question was resolved amicably, it highlights, once again, the difference between how copyright is frequently viewed in the syndicated environment from how it is often seen in the print world. Borsato concludes:
Like many other facets of life in the Internet age, technological possibility is outstripping common practice — and often outstripping common sense. Some of this particular misconception, about what can legitimately be done with online content, can be cleared up through experience and training. Some of it will inevitably be resolved through better technological solutions. But when it comes down to it, we as bloggers must take greater responsibility for tracking how our content is used.
RSS Awareness Day
Thursday, 1 May 2008, is RSS Awareness Day. There’s a grassroots effort to increase the awareness and use of RSS (and syndication tools in general). On the RSS Awareness Day site, it is claimed that “Feedburner recently reported that they track around 60 million RSS subscribers.”
Of course, there are a lot more Internet users today than there were in 2005 (one estimate puts the total at 1.3 billion at the end of December 2007). I would go so far as to triple Feedburner’s estimate to 180 million RSS subscribers, to account for all the users that Feedburner does not know about. And there have to be millions of them: people who “use RSS” without being actively aware of it, such as through “live bookmarks” in Firefox, Safari, and IE, or from web sites that themselves are amalgamations of feeds from other publications. People do not need to know what RSS is to use it.
Still…. even if we triple the number of users Feedburner thinks there are to 180 million, it is still only 13.8% of 1.3 billion users out there. That’s not a particularly overwhelming market penetration figure for something as gosh-darned handy as RSS.
So — talk about RSS on May 1, especially if you can do so without preaching to the converted. You and I probably do not need to be sold on the benefits. But our patrons do. But our parents probably don’t. Take advantage of the first RSS Awareness Day to spread the word.
RSS and Legal Liability
A French court has found that the publisher of a web site is liable for invasion of privacy because it republished rumors, via RSS feeds, that were themselves libelous. See French Websites liable for story in RSS reader (Out-Law.com). The publishers of the 3rd-party sites had to pay fines of between 500 and 1,000 Euro. Out-Law.com, a British legal news site, notes that, “while there has not been a test case in the UK on link liability,” there is a legal precedent that could be relevant in English common law: “A Court of Appeal ruling … found that a man who stood by a roadside placard drawing the attention of passers by to it was liable for its defamatory content, even though he did not create or erect the placard.”
This French case may not have any relevance in the U.S., where the legal concepts of freedom of speech and privacy are differently construed. I find it interesting that one publisher could be guilty of libel by reproducing, without any conscious effort, an RSS feed from another source. One of the strengths of RSS is one of the drawbacks — you subscribe to the feed, come what may.
Do any RSS4Lib readers have opinions on this? Fire away in the comments.
Tagging and Taggers
A recent research paper, “Can Social Bookmarking Improve Web Search?” by Paul Heymann, Georgia Koutrika, and Hector Garcia-Molina, draws numerous interesting conclusions about the effect of taggers and tagging on findability. The authors used del.icio.us as the source for tags.
Several of the results they found:
- “Tags are present in the page text of 50% of the pages they annotate and in the titles of 16% of the pages they annotate” (p. 8). It seems that taggers are not particularly original in their tagging.
- “Pages posted to del.icio.us are often recently modified” (p. 4) and “approximately 25% of URLs posted by users [of del.icio.us] are new, unindexed pages” (p. 5). By monitoring tags of interest to you, you can find out what’s new more effectively than you can by setting up standard search queries.
Their closing section, in which they discuss how tagging could be improved in the long run, bears quoting at length: