Main

Category Archives: Syndication

Getting in Their Face[book]s

Facebook Librarian is an extremely useful application to bring Facebook users (for many of us academic librarians, that translates to the overwhelming majority of our user population) and librarians together. Facebook, as we all know, is a social networking site. Its audience is largely college and post-college people, although there are both younger and older members. Heck, I'm one of those older Facebook users having graduated from school several years before the Web was born.

Facebook Librarian is an application that any Facebook member can add to their profile. Once added, it provides links to a range of resources, including WorldCat, Google Scholar, Internet Archive, Amazon, and so on. But that's not the really interesting thing -- this is: there's an "Ask a Librarian" link in the application that will either link to a librarian at the user's school (if one has signed up through Facebook Librarian) or elsewhere (another school's librarian who volunteered to take questions from all comers). Plus, if a library creates a very simple HTML page and provides the URL of that page to Facebook Librarian, that "widget" is displayed within the application.

This works because college-aged Facebook members are generally associated with an educational institution and Facebook makes that association available to application developers. So if anyone at the University of Michigan with a Facebook account goes to the Facebook Librarian application, they will be able to "Ask a Librarian" (me, in this case) or search our library catalog, journals collection, database collection, or web site from within Facebook. Any library can both register to be a contact for a particular school and/or provide a library widget.

This is a very useful and truly wonderful example of putting the library where the users are.

Kudos to Brad Czerniak, who developed this application, a student at the Library and Information Science program at Wayne State University. You can read more about Facebook Librarian on his blog: Hawidu. Or, to try it yourself, go to http://apps.facebook.com/fblibrarian and click "Add to my profile."

Welcome to the Cut and Paste Web

Content, having reached the age of majority, has left home and is out trying to make its own way in the world. Some "digital parents" are reflexively clutching at their wayward bits, trying to keep on the on the home site. Others are preparing for the all-but-inevitable day, right around the corner, when content grows up and lives on its own, occasionally calling home to say hello and see if there are any updates.

We are on the cusp of what Steve Rubel terms the Cut and Paste Web. In this version of the web -- the building blocks are already there -- you can "you can take any piece of online content that you care about - a news feed, an image, a box score, multimedia, a stream of updates from your friends - and easily pin it wherever you want."

Rubel, who writes for Advertising Age, offers three strategies for thriving in this new era where content is consumed in places far removed from the web site:

  1. Think web services, not websites
  2. Connect people
  3. Make everything portable

As our profession evolves from being gatekeepers to publishers of information, we need to work more actively to expand the ways our patrons use what we have. Or would use it, if only it were offered. Any online tool we build or buy for our library's patrons should be able to provide the same functionality in another venue. Our databases should be searchable (with authentication, of course, where required) from anywhere our patrons want. If someone is building a wiki on a subject, relevant search results should be included right there, live from the database. Ditto for the library catalog, without the authentication. And the same is true for any other tool we offer our patrons in an online environment. Of course, these tools should be equally accessible on a cell phone as on a full PC-based web browser. And the output of patron research should be available in open formats -- so it can be reused and republished. Licensing of content needs to reflect the realities of use, not the other way around.

Rubel concludes as follows: "In the very near future portals including iGoogle, My Yahoo and Netvibes as well as social networks will be able to easily inhale the smallest pieces of content from across the web. Don't wait. Start now to make everything on your website embeddable. Traffic is becoming something that happens elsewhere, not just on your site." Syndication is the next wave of innovation.

Copyright, RSS, and Common Sense

Of the many interesting cans of worms that content syndication tools -- RSS feeds in particular -- open, one of the most significant is copyright. The issue becomes particularly interesting when the RSS feed is the same as the site -- that is, when the blog's author chooses to republish the entire content of an article via RSS.

I think many people assume that, by making content available through RSS or other syndication tools, the content's author has implicitly permitted that content to be used by others. Common practice shows this to be a frequent interpretation. I'm sure many of my fellow bloggers have been as annoyed as I am when I discover that RSS4Lib's content is being reproduced, in its entirety, on another web site whose sole purpose appears to be selling advertisements.

Common practice notwithstanding, reproducing blog content wholesale is wrong, barring a license explicitly granted in the feed or on the originating web site. RSS feeds are protected by copyright just as much as any other work.

There are several mechanisms, of course, for stating your licensing terms. While copyright law (in the United States, at least) does not require an explicit statement of copyright for the item to be protected, it's common sense to do so. You can put a statement on your blog -- and it's probably wise to do so on each post or page, using your weblog software's templates. It's also possible, and advisable, to put copyright statements in your feeds:

  • The RSS 2.0 specification includes a copyright statement for the entire feed, in the channel's <copyright> field, but not for a particular entry.
  • The Atom draft specification has a <rights> field for both the feed and individual entries.

In practical terms, of course, whatever the rights are and however they are declared, they're hard to enforce.

I suspect many of us are happy to have our content included in services like Google Reader, Bloglines, and the like -- after all, we're writing to be read. Short or long excerpts from our posts being used in the context of another blogger's post are also fine with most of us -- that's how discussion happens. At the other end of the scale, I would bet that most of us are less sanguine about our content being reproduced, in whole, for financial gain, by someone else.

Somewhere in the middle is a potential Google project -- described at TechCrunch in a post titled Google May Add Comment Feature On Shared Reader Feeds -- in which users could comment on blog posts within the context of Google Reader. Such a project, if implemented, would move the conversations and discussions about our blog posts from our blogs into "Googlespace," which all too often is akin to a black hole: things go in, but don't come out. I'm not knowledgeable enough about copyright to weight in on the legality of appropriating bloggers' content, reproducing it, and fostering interaction around it without explicit permission, but to me, it's questionable. If this project comes to fruition, it could seriously infringe on the way we as bloggers -- librarian or otherwise -- interact with our users and our patrons.

Best Practices for Building RSS 2.0 Feeds

Like many standards, the RSS 2.0 Specification provides detailed instructions for what elements must or may be in an RSS feed and, in broad terms, how to format them. However, the specification does not -- nor should it -- provide detailed guidance on what to put in the various elements.

That's where the Really Simple Syndication Best Practices Profile comes in. Published by the RSS Advisory Board, the group that has responsibility for maintaining the RSS Specification, the RSS Best Practices Profile offers guidelines on how to format an RSS document for the widest possible audience of aggregators, feed readers, and other tools. The Board tested feeds against a range of aggregators: Bloglines, BottomFeeder 4.4, FeedDemon 2.5 (2.5.0.10), Google Reader, Microsoft Internet Explorer 7, Mozilla Firefox 2.0 (2.0.9), My Yahoo, NewsGator Online and Opera 9 (9.22).

This document is aimed at developers more than at bloggers -- the blog tools we all use already create RSS feeds -- but when we build systems that generate RSS for our users, doing so in the format that has the best chance of providing users with the same experience, regardless of where they consume the feed, is a good idea. For each required or optional element in the RSS specification, this document says what the specification requires and how best to implement that requirement in practice. Some selected recommendations from the guide:

  1. Author: The Board suggests that, for individually authored blogs (where everything is written by the same person), the item's author element be omitted in favor of the channel's managingEditor or webMaster element.
  2. Category: The Board recommends that the category element provide the full hierarchy of the category term, not just the term itself. In other words, a category of "dogs" would be better as (and I'm making this up) "animals/canines/dogs".
  3. Description: The Board makes the common-sense suggestion that, when there are links in an item's description element to other pages on the same site as the blog that the links be fully qualified URLs (for example, http://www.rss4lib.com/index.html), not relative URLs (/index.html).

By taking some simple steps to generate RSS feeds so they will be read and understood by the most common feed readers and aggregators, you can broaden the audience for your content and help ensure that your readers have a uniform experience regardless of where they consume your RSS content.

Directory of Experimental Library Tools Sites

I've started compiling a list of "library labs" -- web sites where libraries of all kinds publicize their experimental, "beta," or trial services. The pages linked below offer a wealth of ideas and innovations.

The full list will be maintained as the Directory of Experimental Library Tools.

Please contribute your own library's site if it's not listed already.

New Tagging Tool at University of Michigan Library

I'd like to talk about a tagging project we just launched at my workplace. MTagger is a social bookmarking tool that we've integrated into several University of Michigan library resources. A tag cloud now appears:

Like del.icio.us and many other social bookmarking tools available on the Internet, MTagger allows users to bookmark and tag web pages using language that makes sense to them. Anyone can see tag clouds on pages and search MTagger; only users with valid U-M network logins can apply tags. (Individuals can, of course, opt out of sharing their tags with others if they choose.)

Unlike these other tools, MTagger offers the concept of "Collections" -- letting users restrict their searches for similarly tagged items to a specific collection (library catalog records, images, web pages, etc.). While tags themselves would allow people to serendipitously find items in other collections, the "Collections" metaphor will, we expect, help drive home that the library offers more than books, electronic journals, and databases.

More important than the tagging functionality itself is what MTagger will allow our faculty, staff, and students to do. MTagger brings a social component to research that we have not previously had. It will allow users to share knowledge about library resources with each other, to enable quick-and-dirty subject guides to be produced, and -- we hope -- to bring researchers together via their individual tag clouds. As research moves online, chance meetings in the stacks of researchers with overlapping interests become even more rare. Through tagging, we hope to be able to recreate some of those synergistic interactions as one researcher finds a tag of interest, and through that, the other researcher.

Oh, and just to keep this in the realm of libraries and RSS, anything that can be searched within MTagger can be accessed via an RSS feed.

Tagging and Taggers

A recent research paper, "Can Social Bookmarking Improve Web Search?" by Paul Heymann, Georgia Koutrika, and Hector Garcia-Molina, draws numerous interesting conclusions about the effect of taggers and tagging on findability. The authors used del.icio.us as the source for tags.

Several of the results they found:

  • "Tags are present in the page text of 50% of the pages they annotate and in the titles of 16% of the pages they annotate" (p. 8). It seems that taggers are not particularly original in their tagging.
  • "Pages posted to del.icio.us are often recently modified" (p. 4) and "approximately 25% of URLs posted by users [of del.icio.us] are new, unindexed pages" (p. 5). By monitoring tags of interest to you, you can find out what's new more effectively than you can by setting up standard search queries.

Their closing section, in which they discuss how tagging could be improved in the long run, bears quoting at length:

In terms of tags, we believe that user interface features could have a large impact on improving the quality of tags for search. For instance, interfaces that recommended tags not in the page, or not common for the given domain, might help alleviate those two problems. Another approach might be to have domain-specific sites (e.g., photography) which might have higher quality tags due to the shared context of the users.

RSS and Legal Liability

A French court has found that the publisher of a web site is liable for invasion of privacy because it republished rumors, via RSS feeds, that were themselves libelous. See French Websites liable for story in RSS reader (Out-Law.com). The publishers of the 3rd-party sites had to pay fines of between 500 and 1,000 Euro. Out-Law.com, a British legal news site, notes that, "while there has not been a test case in the UK on link liability," there is a legal precedent that could be relevant in English common law: "A Court of Appeal ruling ... found that a man who stood by a roadside placard drawing the attention of passers by to it was liable for its defamatory content, even though he did not create or erect the placard."

This French case may not have any relevance in the U.S., where the legal concepts of freedom of speech and privacy are differently construed. I find it interesting that one publisher could be guilty of libel by reproducing, without any conscious effort, an RSS feed from another source. One of the strengths of RSS is one of the drawbacks -- you subscribe to the feed, come what may.

Do any RSS4Lib readers have opinions on this? Fire away in the comments.

RSS Awareness Day

RSS Awareness Day

Thursday, 1 May 2008, is RSS Awareness Day. There's a grassroots effort to increase the awareness and use of RSS (and syndication tools in general). On the RSS Awareness Day site, it is claimed that "Feedburner recently reported that they track around 60 million RSS subscribers."

Of course, there are a lot more Internet users today than there were in 2005 (one estimate puts the total at 1.3 billion at the end of December 2007). I would go so far as to triple Feedburner's estimate to 180 million RSS subscribers, to account for all the users that Feedburner does not know about. And there have to be millions of them: people who "use RSS" without being actively aware of it, such as through "live bookmarks" in Firefox, Safari, and IE, or from web sites that themselves are amalgamations of feeds from other publications. People do not need to know what RSS is to use it.

Still.... even if we triple the number of users Feedburner thinks there are to 180 million, it is still only 13.8% of 1.3 billion users out there. That's not a particularly overwhelming market penetration figure for something as gosh-darned handy as RSS.

So -- talk about RSS on May 1, especially if you can do so without preaching to the converted. You and I probably do not need to be sold on the benefits. But our patrons do. But our parents probably don't. Take advantage of the first RSS Awareness Day to spread the word.

RSS Feeds & Copyright

Copyright and fair use are poorly understood in the population at large (just ask high school teachers or college professors how much time they spend vetting submitted papers for flagrant -- let alone subtle -- plagiarism). However, syndication technologies such as RSS and Atom make it so easy to repurpose works that what's proper -- morally or legally -- is often overlooked. After all, feeds are purpose built to make content portable. If the author did not want others to copy the content, the author would not send it out in a format designed for its simple syndication.

The Australian magazine PC World runs an interesting article by Larry Borsato: "Who owns 'public' content? RSS feed ownership brought into question." In the article, Borsato recounts a recent incident in which a commercial entity reproduced, in toto, his blog posts via RSS on its web site. While Borsato has a Creative Commons non-commercial attribution license, he felt the commercial entity had violated it; they were, after all, a commercial entity. While the question was resolved amicably, it highlights, once again, the difference between how copyright is frequently viewed in the syndicated environment from how it is often seen in the print world. Borsato concludes:

In the same way that I can't reprint a Harry Potter book and start selling it for my own gain, we need to realize that we can't do that with RSS feeds or other Web content either. While Fair Use is OK, you can't just start lifting and reusing entire bodies of work without permission.

Like many other facets of life in the Internet age, technological possibility is outstripping common practice -- and often outstripping common sense. Some of this particular misconception, about what can legitimately be done with online content, can be cleared up through experience and training. Some of it will inevitably be resolved through better technological solutions. But when it comes down to it, we as bloggers must take greater responsibility for tracking how our content is used.

AP, Bloggers, and Fair Use

The Associated Press has stepped back from its original position on copyright and the blogosphere and will be developing a (hopefully) more nuanced policy. According to an article in the June 16 issue of The New York Times, "The Associated Press ... said that it will, for the first time, attempt to define clear standards as to how much of its articles and broadcasts bloggers and Web sites can excerpt without infringing on The A.P.'s copyright."

The recent controversy arose when AP requested that the Drudge Retort (a left-leaning response to Matt Drudge's conservative Drudge Report) remove seven portions of its syndicated news stories from its web site. (I should note that the excerpts varied in length from 39 to 79 words; the excerpt I have in the previous paragraph is a hopefully safe 37 words.)

The AP's move to better define fair use when it comes to blogging about news is a welcome one. As I discussed last month, there is a vast gap between what publishers desire and what common practice defines in the realm of copyright. The doctrine of "Fair Use" is "unclear and not easily defined." (This according to the U.S. Copyright office itself!) Fair use is usually decided in the courts, after the fact. Bloggers have taken their stand through their actions -- for better or worse, a significant portion of bloggers view fair use liberally. Publishers have, as fits their economic interest, taken a more restrictive view. It is refreshing to see a major publisher declare its interest in finding a middle ground that it can endorse.

New Pew Survey on Blogging and Blog Readership

The Pew Internet & American Life Project released a summary of a spring survey on bloggers and blog readers: New Numbers for Blogging and Blog Readership.

Although the full report is not presented, some summary information is. These points are of note in the report's discussion about blog readership:


  • "33% of internet users (the equivalent of 24% of all adults) say they read blogs, with 11% of internet users doing so on a typical day."

  • "42% of internet users (representing 32% of all adults)" say they have, at some time, read a blog or online journal.

  • Men and women in this study are equally likely to say that they currently read other people's blogs (35% for men, 32% for women)

  • Men are more likely than women to say that they have read other people's blogs at some point in the past (48% vs. 38%). Pew speculates that this difference is because men "are generally heavily represented among the early adopters for most technologies, but women catch up over time."

Items of note in the discussion about blog authorship:


  • "12% of internet users (representing 9% of all adults) say they ever create or work on their own online journal or blog."

  • "For a majority of bloggers, working on their blog is not an every-day activity: 5% of internet users blog on a typical day."

If a quarter of all adults say they read blogs on a daily basis, I wonder what additional percentage read blogs without knowing it? I also wonder what percentage of the currently active blog-reading population does so via RSS, and if they realize they're reading a blog when they go to Google Reader or Bloglines.

Via Rich at J's Scratchpad

RSS: The Shipping Container of the Internet

Not too long ago, I read a fascinating book about international shipping. No, I'm serious: Marc Levinson's The Box: How the Shipping Container Made the World Smaller and the World Economy Bigger, published in 2006 (which happens to have been the 50th anniversary of that ubiquitous part of the global economy, the shipping container).

Shipping Containers
Photo by NeonMan

In a nutshell, the standardized shipping container revolutionized international trade by vastly speeding up the loading and unloading of ships. The cargo that had been brought to the wharf, unloaded from a truck into a pile on the dock, moved piece by piece into cargo netting to be hoisted by crane into the hold of a ship, so that it could be removed the cargo net and then shoved in the corner of a hold, was now as complicated as building a stack of bricks. OK, a bit more complicated, since loading and unloading containers is really an art, the ship needs to be properly balanced, and so forth -- but basically, a crane operator and few others can load a ship. Turnaround times at pier -- when large, expensive, freighters were just sitting there -- were reduced dramatically.

Wharf, Crane, and Containers
Photo by anaulin

What does this have to do with RSS? Quite a bit, actually. RSS is the box into which any old thing can be packed, for uniform shipping from producer to consumer. A paragraph of text, an audio podcast, a video podcast, a Word document... If you can be put it online, you can shove it a container (the RSS item), give it a bill of lading (the RSS channel), pre-clear it for customs (tags, authors, keywords, etc.), and send it on its merry way on a conveyance (the RSS feed). Nobody has to touch the contents between shipper and receiver -- just once to pack it, once to unpack it.

The feed is empty.... Fill it!

Empty Container
Photo by James Good

Addendum (10AM 5 August 08): Another similarity pointed out to me (thanks Cindi) is that RSS and shipping containers both lack security and authentication. The ramifications of this problem are a bit more serious for shipping containers than for feeds. Still, not really knowing who might have mucked with a feed between origin and destination, or having any real knowledge of who published it in the first place once the feed items are scattered around the Internet, can be a problem. Feeds, once set free, can have a life of their own.

Creative Commons and Blogging

Copyright and RSS frequently appear to be ill-suited bedfellows. On one side we have the author's desire to have one's content distributed as widely as possible. On the other, we have the publisher's desire to control the way one's content is used -- out of the concern for losing control over one's work, perceived or real financial loss, or simple desire to be properly attributed. Where in traditional media, publisher and author are usually different (and the most common place those two roles intersected was the vanity press), in "new media," the same person frequently takes on both roles.

Copyright is often seen as complicated, and for good reason. In the United States, anyway, a work is copyrighted at the moment it is created and may not be reproduced with explicit permission. (The legal concept of "Fair use," in the United States, is at best murky. It's a right that does not readily extend to other legal domains. And, it almost certainly does not apply to the wholesale reproduction of items from an RSS feed. But I'm no lawyer.) At the other extreme, the author can explicitly waive copyright -- a choice that few authors or publishers would opt for. In the middle ground is licensing the use of content for various uses. This is the sensible middle ground, for most bloggers: some uses of my content are fine while others are not.

However, the challenge arises in setting the language of that license and defining the kinds of use to allow. Doing so in a legally defensible way is complicated (again, I'm no lawyer). So what should the blogger to do? Use Creative Commons. Creative Commons (CC) is a non-profit foundation that has written legally valid and clearly understandable licenses that anyone may use. By applying one of CC's licenses to blog content, the blogger can state clearly what uses of that content are allowed. Can it be reused wholesale? Reused only if the person using it does not make any money from it? Reused only if attribution is given and no changes are made to the original? There are many permutations. (Unfortunately, there's no standard way to license content in an RSS feed.)

RSS4Lib is now licensed under an Attribution-Noncommercial 3.0 United States license. (Look toward the bottom of the sidebar.)

If you want to learn more about using content -- including RSS4Lib -- that has a Creative Commons license, I highly recommend CC HowTo #1: How to Attribute a Creative Commons licensed work at my colleague Molly Kleinman's blog. The first in her planned series of posts is excellent, and I look forward to future installments.

Putting a Creative Commons License in Your Feeds

Did you know you that it's easy to add a creative commons license to your RSS and Atom feeds -- not just to your blog's web site? Here are brief instructions for adding your Creative Commons license to RSS and Atom feeds:

RSS 2.0

You need to make two small edits to the RSS 2.0 template your blog software uses.

  1. Change the line that reads <rss version = "2.0"> to <rss version="2.0" xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule">. This is probably the second line in the RSS file. The addition of the "xmlns..." bit sets up the second item you'll edit, pointing to the web page that defines an extension to the standard RSS 2.0 field set.
  2. Then add the URL to the Creative Commons license you've selected at the Creative Commons web site. This bit goes anywhere between the <channel> and the </channel> tags. For example, RSS4Lib has an "attribution non-commercial" license, version 3.0. So I've added this code to my RSS feed: <creativeCommons:license>http://creativecommons.org/licenses/by-nc/3.0/us/</creativeCommons:license>

Atom

It is even easier to add a Creative Commons license to an Atom feed. There's just one line to add to the Atom template. For RSS4Lib, this is: <link rel="license" type="application/rdf+xml" href="http://creativecommons.org/licenses/by-nc/3.0/us/rdf" />. Again, this assumes you've picked an "attribution non-commercial" license. Whatever the Creative Commons license URL, add "rdf" to the end. And that's it.

So What?

I suppose that putting the license on the web site alone is enough, from a strict legal standpoint. However, as we all know, RSS feeds have a habit of wandering off almost on their own power. Adding the license to the feeds themselves gives you an extra bit of protection -- the consumer of the feed cannot say they were unaware the content was licensed.

Did Bloglines Purge Its Subscription Rolls?

I just noticed that the number of subscribers at Bloglines recently fell sharply. (I use RSS4Lib Feedstats tool to track my blog's readership via RSS.) On September 27, Bloglines was reporting a total of 872 subscribers to the RSS and Atom feeds from my blog (see details). This number is consistent -- Bloglines has reported a gradually growing number of readers, adding a few each week, over the past months. On September 28, there were only 565 subscribers (see details) to the two feeds, according to Bloglines.

Bloglines reports the number of users who read each feed in the server log files. For example, the Bloglines crawler passed through earlier today and left this log line:

65.214.44.28 - - [06/Oct/2008:08:12:38 -0700] "GET /index.xml HTTP/1.1" 304 - "-" "Bloglines/3.1 (http://www.bloglines.com; 527 subscribers)"

The number of Bloglines subscribers went down for both RSS and Atom (from 796 to 527 and 56 to 38, respectively). These lower numbers have stayed consistent since 9/27/2008, which makes me think it's not just a transient error. Interestingly, the numbers reported for each feed within the Bloglines web site have not changed. (See Bloglines' list of RSS and Atom subscribers, neither of which has been updated.) The web site's numbers have always lagged the Bloglines crawler's numbers by a week or more, so the discrepancy itself is probably not significant.

Has anyone else noticed that Bloglines subscriber numbers took a dive a couple weeks ago? Can anyone with a long-dormant Bloglines account confirm that it has been purged?

Liability Insurance for Bloggers

With the rise of blogging as a recognized form of journalism "has come greater scrutiny and the inevitable rise in legal threats facing bloggers," says David Cox of the Media Bloggers Association (MBA), a not-for-profit, non-partisan organization supporting the development of blogging and citizen journalism. The MBA has recently announced a program to offer "liability insurance program for bloggers which provides coverage for all forms of defamation, invasion of privacy and copyright infringement or similar allegations arising out of blogging activities."

As bloggers take up larger roles in journalism, public commentary, and social discourse, the individuals and organization they write about are increasingly paying attention. The risks of being accused of libelous, defamatory, or other language are the same as in any other media; the world is now paying more attention. The MBA now offers an online course, "Online Media Law: The Basics for Bloggers and Other Online Publishers," without charge. Upon completing the course, students are offered the opportunity to join the MBA and then to purchase (at a discount) the liability insurance. Anyone who has taken the course has access to directories of attorneys specializing in online libel cases.

Is it worth the cost of an MBA membership ($25/year) and insurance (I could not find details of the insurance cost on the MBA site) to mitigate against what I suspect is a small risk for me? Probably not, in my case. The more controversial a blogger's posts, though, the more likely it is that someone might find them legally troublesome (and not just annoying).

Related Post

RSS and Legal Liability (4/24/2008)

Disclaimer: I have no affiliation whatsoever with the Media Bloggers Association.

Bloglines Update

There's been a lot of discussion (see TechCrunch, What I Learned Today, and Law.Librarians, among others) about Bloglines and the problems they've been having. I noticed today that Bloglines has returned to the status quo ante -- RSS4Lib's subscriber numbers, as inaccurate and wonky as they may be, have returned to where they were before I noticed the precipitous drop. Looking at my log files, I see that the Bloglines crawler now reports that I have 799 subscribers to the RSS feed (and 56 to the Atom). That actually represents growth over the last consistent numbers Bloglines provided via the crawler and, not unusually, a few more than Bloglines reports on its web site (its Beta site is more up-to-date, reporting the same 799 RSS feed subscribers as its crawler does).

Bloglines posted on its technical blog yesterday a brief note about the outage. It was Apple-esque in the level of details -- it offered none -- other than to say that the problem was fixed:

Some folks might have noticed that specific feeds were not updating recently on Bloglines, and we wanted to update you and fill you in on what’s been going on. We have figured out what the glitch has been. Over the weekend, a fix was released on Bloglines to resolve the issue. All feeds should now be updating and back to normal. If you're still experiencing problems you can report a stuck feed.

I still prefer Bloglines to Google Reader (call me old-fashioned), but was about to make the leap. I'm pleased the Bloglines is still alive and keeping their crawlers and index going. Google needs the competition -- even if it's not as serious as it could be.

TicTOCs: It's about Time

The JISC ticTOCS service has been formally launched after a significant trial period. (I first wrote about this service in July 2007.) The ticTOCs service aggregates the tables of contents (TOCs) from 11,470 scholarly journals from 422 publishers, for a total of 296,186 full-text articles. (Of course, you or your institution must have access to the full text of these journals to view them; the table of contents, though, is free.)

The idea behind ticTOCs is to make finding and subscribing to table of contents RSS feeds a simple process. This free service is long overdue. Getting lists of tables of contents from journal publishers is time-consuming, if it is possible at all. Being able to pull together feeds across journals in one OPML file will prove helpful to libraries wanting to deliver current awareness services, have more up-to-date subject guides (with a list of recent articles in that topics 'hot' journals), or to augment catalog records.

The site lets you identify journals of interest by topic, by title, or by publisher, subscribe to their tables of contents (the "TOCs") by checking ("ticking") a box, and then getting an aggregated feed of articles and abstracts to review. You can add all journals matching a search (subject, title word, or publisher) to your profile with a single click, or add individual titles.

Screen Shot

You can export the subscription list of tables of contents as an OPML file to add to your favorite reader. For example, here is an OPML file of the 25 journals in the Computers — Internet category.

TicTOCs display each journal with its title, the standard icon for the RSS feed, and a menu to add that feed to either Bloglines or Google Reader. Articles are shown by title; a link at the top of the display allows you to show the full abstract provided by the publisher. And each article includes a link to add the citation to RefWorks.

Screen Shot

TicTOCs also opens the door to other advanced services. For one example, once you have an OPML file for the RSS feeds for a group of journals, that list of feeds could be run through Yahoo! Pipes or other similar tool to filter for keywords. For another, the OPML file from ticTOCs could be edited to redirect all full-text links through the library's proxy server, allowing that library's users to get to the full text articles without any hassle at all.

Future developments I, for one, would like to see include (of course) more publishers -- where's Elsevier -- and a simple way to query ticTOCs with a journal's ISSN or EISSN and get back the canonical RSS feed. Such a service would let libraries more easily add an RSS feed for a journal to that journal's entry in the local library catalog. It would also be helpful, at an institutional level, to have automatic rewriting of full-text URLs in table of contents feeds that included the library's proxy server.

This service will save librarians time and, more importantly, save patrons time.

Related Articles

Correction

There are, in fact, 1870 Elsevier journal titles in ticTOCs -- thanks to Roddy MacLeod for pointing out my error.

Updated 12 Feb 2009

For you programmers out there, ticTOCs now offers a downloadable file of journal titles, ISSNs, and RSS feed URLs. Not quite an API, but a good start. See the ticTOCs news site for details and or get the ticTOCs data set for yourself.

Research Blogging -- Connecting the Blogosphere to "The Literature" a Link at a Time

The Research Blogging web site is a nexus for peer-reviewed literature and serious academic blogging about it. (No, that's not a contradiction in terms.) Research Blogging helps readers find critical analysis of scientific reporting by pulling these scholarly blog posts together. Many of these blog posts in reaction to published articles are written by experts in the field and can help you put the article in context. From the Research Blogging site:

Do you like to read about new developments in science and other fields? Are you tired of "science by press release"? ResearchBlogging.org is your place. Research Blogging allows readers to easily find blog posts about serious peer-reviewed research, instead of just news reports and press releases.

While the majority of posts indexed by Research Blogging are in the hard sciences, there are a reasonable number in the area of information and library science, broadly construed.

To join the commentary, you must register. Once you've done that, you get a code snippet to include in your posts that are about peer-reviewed articles. Research Blogging then adds your post to its index. Being a scholarly effort, there is peer review of posts indexed by the site -- if other registered research bloggers feel your post does not follow the site's guidelines, it is removed from the index. This keeps the content relevant to the site's mission.

Fair Use and Quoting Commercial Content

Say you're blogging about a topic and you want to quote an excerpt of a commercial publication in your post. How much can you quote, verbatim, under the doctrine of "fair use"?

This is the subject of an article in Sunday's New York Times: "Copyright Challenge for Sites That Excerpt." Here's how the Times article sums up the problem:

The legal disputes are emblematic of a larger question that has emerged from the Internet’s link economy. The editors of many Web sites, including ones operated by the Times Company, post excerpts from competitors’ content from time to time. At what point does excerpting from an article become illegal copying?

Courts have not provided much of an answer. In the United States, the copyright law provides a four-point definition of fair use, which takes into consideration the purpose (commercial vs. educational) and the substantiality of the excerpt.

[Yes, I'm aware of the irony of quoting an 88-word passage from an article about the dangers of quoting lengthy passages from commercial publications in a blog post.]

The concept of Fair Use is a tricky, and slippery one. The courts usually determine what is “fair” after the fact, when someone complains. A short excerpt that doesn't reproduce the heart of the original almost certainly is fair use; a significant excerpt that reproduced the main point of the article probably is not. Reproducing the entire original clearly fails the text.

The culture of the blogosphere has been to define fair use quite broadly; the courts have rarely, if ever, had a voice in the matter. Yet. I suspect that, as the economic woes facing the corporate world drag on, commercial sites will increasingly perceive extensive quoting of their content on other sites as an economic problem. Whether such extensive quoting of published content has a real or imagined effect on revenue -- and whether that effect is positive or negative -- remains to be seen.


Related Posts


Facebook Notes Redirects Your Feeds

I jumped on the Facebook bandwagon as it was pulling out of town and created a Facebook page for RSS4Lib (become a fan!). In the process, as I was adding the RSS feed for this blog using the Notes tool, I noticed something more than a little annoying: RSS feeds added to a Facebook page using Facebook's Notes application are rewritten to drive all traffic from that version of the feed to Facebook, not your own site. While clearly in Facebook's financial interest to bring more traffic to Facebook, they do so without explicit permission.

When you set up an RSS feed into Facebook notes, you are asked to agree to a brief terms and conditions that says, in its entirety, "By entering a URL, you represent that you have the right to permit us to reproduce this content on the Facebook site and that the content is not obscene or illegal."

Facebook notes terms and conditions

However, Facebook's concept of "reproduce on the Facebook site" and mine are somewhat different. While I fully understood that my blog posts would be presented inside Facebook -- as they are on the RSS4Lib Notes page, I am surprised that the associated RSS feed includes rewritten channel and item data. As an example, take a look at the feed's channel data:

  <channel>
    <title>RSS4Lib: Innovative Ways Libraries Use RSS's Facebook Notes</title>
    <link>http://www.facebook.com/notes.php?id=81126379633</link>
    <description>RSS4Lib: Innovative Ways Libraries Use RSS's Facebook Notes</description>
    <language>en-us</language>
    <category domain="Facebook">NotesFeed</category>
    <generator>Facebook Syndication</generator><docs>http://www.rssboard.org/rss-specification</docs>
    <managingEditor>http://www.facebook.com/pages/RSS4Lib-Innovative-Ways-Libraries-Use-RSS/81126379633</managingEditor>
    <webMaster>webmaster@facebook.com</webMaster>
    ...
  </channel>

The feed's link goes to Facebook (http://www.facebook.com/notes.php?id=81126379633). That page provides reproductions of recent posts. Clicking on a post title, within Facebook, brings up that page in another Facebook page. There is a tiny link at the bottom of the page to "View original post".

The individual items in the RSS feed are likewise rewritten:

    <item>
      <guid>http://www.rss4lib.com/2009/05/feedmil_finds_feeds.html</guid>
      <title>Feedmil Finds Feeds</title>
      <link>http://www.facebook.com/note.php?note_id=82829822943</link>
      <description>Full Text of Post Goes Here</description>
      <pubDate>Wed, 13 May 2009 16:39:36 +0000</pubDate>
      <author>RSS4Lib: Innovative Ways Libraries Use RSS</author>
      <dc:creator>RSS4Lib: Innovative Ways Libraries Use RSS</dc:creator>
      <source url="http://www.rss4lib.com/index.xml">http://www.rss4lib.com/2009/05/feedmil_finds_feeds.html</source>
    </item>


They rewrite the link. They change the author from what it is in the original post, "rss4lib@gmail.com (Ken Varnum)", assign a creator that is not the author cited in the original post, and link to an RSS feed as the source. (Facebook does display the URL of the post, but clicking it goes to the feed. Depending on your web browser, it may not be helpful behavior to get an XML file.) They don't provide attribution for individual posts on the site.

The way Facebook is using my content does not fit my understanding of the Creative Commons "Attribution Non Commercial License" I have applied. Among other things, it states that:


  1. You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work)

  2. You may not use this work for commercial purposes.

I'm willing to give on point 2 -- yes, I understand that by reproducing my blog on Facebook's site that I'm contributing to their commercial gain -- but on point 1, I did not waive my right to appropriate attribution as specified in the license on the blog by agreeing to "reproduce" the blog on their site. If this is "remixing," allowed in the Attribution Non Commercial license, requires that the licensee "takes reasonable steps to clearly label, demarcate or otherwise identify that changes were made to the original Work." This has not been done.

This sort of misuse of content happens all the time, of course, but rarely so blatantly.

Related Posts

RSS and Atom Comparison

What's the difference between RSS and Atom? Both are XML formats, both are in common use, and most people who read RSS feeds don't need to know the technical differences between them. Atom was designed to resolve the incompatibilities among the various versions of RSS (0.92, 1.0, and 2.0) and is a bit more complex. It is also an Internet Engineering Task Force (IETF) specification, RFC4287. RSS 2.0's specification is less formally reviewed and approved, but is still a standard.

If you're curious about how they're structured, here's a page for you: Comparison of RSS 2.0 and Atom. The top of this page shows a schematic of each data format -- showing each element and its children -- and a sample file. This page goes on to a more technical discussion about generating each of these formats with JavaScript, but the schematics are handy and helpful.

RSS Readers Not Dead Yet

ReadWriteWeb says, "5 Reasons Why RSS Readers Still Rock." To summarize the post, here are the five reasons RSS readers are still relevant, according to RWW:

  1. Control over Information Flow
  2. Evolving User Interfaces
  3. Tracking Twitter
  4. Mobile News
  5. Categorized News

This post is in response to an earlier RWW post, "RSS Reader Market in Disarray, Continues to Decline," which engendered a lively discussion in the comments.

Dave Winer, a pioneer of RSS, noted in the comments to the more recent post that RSS readers get one thing fundamentally wrong: they treat feeds like email by telling you how many unread messages you have and encouraging you to read each one. (I'm one of those weirdos who cannot stand having messages, especially unread ones, hanging around in my inbox. Having a growing tally of unread RSS items pushes me right over the edge and is the main reason I stopped consuming my feeds in my mail application.)

A number of automated tools offer filters for RSS feeds (many have been reviewed or discussed here). Most of them rely on explicit, user-defined keywords. Others, like Twitter, rely on one's peers to identify the interesting stuff. However, I have yet to find a tool that offers the best of keyword filtering (letting through articles on topics that are of likely interest) while still surprising and delighting me with nearly, but not quite, on-topic posts. That's an incredibly delicate, arbitrary, and undefinable balance to strike.

Farewell to Full-Text Feeds?

I've noticed over time that the number of people who 'consume' RSS4Lib on RSS4Lib.com has declined steadily over the years. Yet the number of feed subscribers is still steadily increasing (see today's subscriber report and has recently broken 2000).

At the same time, few articles I post are read on RSS4Lib.com more than 100 times the day they are published, and most are viewed only a few times a day after that. (Selected items in the backfile, thanks to Google and Bing, get more traffic than recently published items once the new posts have aged a few days.)

I suspect this trend holds true across many blogs, whether they're produced for love or money. (This one, to be clear, is not produced as a moneymaking venture.)

Some are suggesting that the days of full-text feeds are numbered (see "Say Bye Bye to Full RSS Feeds and RSS: What's the Deal in 2010?," as examples). I'm curious to know if these commercial prognosticators are correct -- will bloggers tend to pull people toward the richness of their sites, even if there is no particularly strong monetary incentive to do so? Or will full-text feeds continue to be the way to go? I suspect a trend toward full text feeds (for blogs that are works of avocation) and snippet feeds (for those that are more vocational). And I'll wager that this will break down (to oversimplify greatly) into an academic/commercial divide.

Meta

This page lists (oldest to newest) items posted to the Syndication category.

RSS Feed

RSS4Lib is written by Ken Varnum. Contact Ken.

Archives

Creative Commons License
This weblog is licensed under a Creative Commons Attribution Non-Commercial License.