Clever Use of Tag Clouds

The University of Western Ontario’s office of Communications and Public Affairs offers a directory of blogs published by members of the UWO community. Not only is the directory a good idea, but they also provide a tag cloud that covers the content in all these different blogs. To see the cloud, look at the top of their main weblog page.
The tag cloud shows a summary of all of the blogs under this umbrella. Some are personal, some are project-based, some are from academic departments — they run the gamut. The blogs may be on a UWO server or they may be hosted elsewhere — as long as the blogger has let UWO know they’re part of that community, they’re included. Very clever.
The tag cloud itself is generated through TagCloud.com. I’d be curious to know if anyone has developed an open source tag cloud generator — something that would take a set of RSS feeds and generate clouds… I can see that being very handy for a number of libraries, or other institutions. If you know of one, let me know in the comments.

6 thoughts on “Clever Use of Tag Clouds”

  1. Paul,
    Thanks for the pointer. That’s conceptually close to what I’m looking for, but not quite the right focus.
    What I’d like to do is take RSS feeds from selected sources and create a tag cloud from them based on word frequency appearing in the feed text (and probably dropping common words that don’t lend much meaning). I see this as a way to provide a quick visual index to a thematically similar set of feeds…

  2. Too Many Tabs

    I have a bajillion and one tabs open in my browser, all things I wish I’d had time to blog about during the last few months. I have to start making room for new tabs I don’t have time to…

  3. Are you talking about something along the lines of a visual concordance? I’ve not seen anything like this out there, but it wouldn’t be too difficult to create. Seems to me the only hard part would be building the indexing system. Many tag clouds I’ve seen (and created) cache the tags to databases and then do simple counts on their frequencies. Some of these algorithms enforce the “long-tail” power law and then use logs to balance them out a bit.
    I haven’t looked at Lucene in a while, but you should be able to use it to spider a blog (or set of them) and then return back the most frequent terms. After that, it’s just a matter of resizing/coloring the text.
    We’ve also been aggregating RSS feeds for student papers here the last couple of years using what was Macromedia’s MXNA (http://swem.wm.edu/resources/blogs/). What I found is that different folks use different portions of RSS and RDF to expose their content. A lot of them were exposing RSS feeds with no description element, so there wasn’t a good way to actually index the content. So, if you do a search, you only look through headings (at least for the feeds that expose themselves that way). Exposing the words as tags then becomes a problem, but you can get around that by not including them.
    Anyway, now that you got me thinking about it, I’ll try to throw something together this week or next to test it out.

  4. Any progress on this front? I’d love to see an open source script that did this. I too think it would be extremely useful. Cheers!

Comments are closed.