Discovering Discovery at LITA Forum

Notes from a  talk by Annette Bailey of Virginia Tech at the LITA National Forum, “Discovering Discovery.”

Virginia Tech has been a Summon customer since 2010. They have leveraged Summon to change cataloging practices locally. Still using original Summon (1.0) interface.

Library users are shifting behaviors. Increasing usage of online resources, physical spaces — but not physical resources. Discovery largely happens through Summon. How can VT know what its users are doing? COUNTER provides some information, but its delayed, and hard to process. Summon provides aggregate data on search terms and click data. How can we know what users are doing in real time? And share it with other members of the community, show visually what research is happening, live?

Discovery VisualizationThat is the heart of Discovering Discovery — what are users clicking on in Summon, in real time. Can’t tell if they use the item, but can tell that they accessed it.

This tool helps everyone — librarians, the public, students — to understand what is being done in the library. User does a search. There’s some custom JavaScript in the Summon interface that sends a record of the click to the visualization server, which stores it in a database. A visualization tool then makes a display on demand. It grabs the Summon record ID, unique for each item. They then use the Summon API to grab the metadata for that query — because Summon IDs are not persistent over the long term. All of that is stored in an SQLite database.

As a side note, they can tell how many unique items were clicked on over time — hard to do otherwise.

Current log analysis extracts and tabulates data at 1 minute, 5 minute, 1 day, 1 week intervals. Tabulates by discipline, content type, source of record, publication year. All comes from Summon, which means data are problematic. Does word frequencies for abstract, title, and abstract & title combined, and keywords & subject terms.

Use the d3.js library to do visualizations. It’s a powerful tool, but hard to work with. Follows jQuery in style. Also uses a variety of server-side technologies.

Summon 2.0 — not there yet. Unlike Summon 1.0, there is now an officially sanctioned way to include JavaScript (it’s a hack in 1.0). It now includes d3.js in Summon — they do not appear to be using it yet, but it’s there. Look out for visualizations at some point…. But they need to reverse engineer Summon 2.0 to achieve the same effect as in Summon 1.0.

Using this with other discovery services. You need to be able to record clicks, in real time. You need an API to get the machine data. If you use a different discovery service and want to try adapting this code, VT would like to work with you.

The visualization is the hard part; getting the data was the relatively easy part. Code needs to be consolidated, into a cloud solution, to make your version for your own use. (Like the Libx edition builder).

Internet Archive Tries to 404 the 404

The Internet Archive announced today a new service — creating a permalink for a web page that leads to copy of the page at the Internet Archive. So, for example, I just created a permanent snapshot of this blog’s homepage as of 25 October 2013 at 19:35.43, preserved forever and fully citable:

This blog probably doesn’t deserve that sort of immortality. But what about more significant things? Rather than citing a web page with a note “accessed on 25 October 2103”, let the Internet Archive grab a snapshot of it and link to that. It would be lovely if this service could be extended into licensed content so that citations to academic (and all-too-often behind a pay wall based on one’s affiliation with the library’s parent institution) content could be equally persistent.

Scholarly content, as a rule, is provided through a non-persistent URL, if we ignore DOIs and Handles. Those valuable tools, of course, are only good as long as the owner of the content maintains their persistence. The owner of the content is responsible for updating destination links. That may not be the  highest priority in a bankruptcy or other sudden and unexpected cessation of operations.

This new service makes possible better back-references to the historical record.

Perspective on Discovery

I’ve been reading with interest the items that have been written in the past few weeks about library discovery by Lorcan Dempsey, Dale Askey, Aaron Tay, and Carl Grant, among others. Library discovery, of course, is the capability to search effectively across a wide range of online materials available through a given library (whether owned, licensed, leased, open source, locally digitized, or what have you) through a single search box. There are vendor products and homegrown solutions, and hybrids of the two.

Is discovery dead already? Is it still the hot new thing, the Holy Grail of disintermediated patron interaction?

No, and no.

Askey makes great points about the serious challenge we libraries in digitizing our materials for access (not to mention preservation). I’ll call this the “last shelf” challenge. Just as incredible high-speed internet is within the reach of just about every urban home, it’s the “last mile” that’s the kicker. Getting fiber to the door of every abode is and expensive, slow, process. Getting the “last shelf” digitized is similarly expensive and slow. We’ve done the easy stuff — non-unique, commodity items — already. Digitizing the “last shelf” should rightly be a significant goal for all libraries holding unique materials.

A discovery tool is only as good as its content for the intended use by the individual patron. Yes, libraries should be proud of, should enable access to, and should promote the living daylights out of the items that are uniquely theirs. These “lost” items can provide researchers at all levels with paths to innovation and discovery (in the traditional sense of the word).

Where I think the value of discovery could be, for academic libraries in particular, is in customizing the results of discovery for the user’s need. Why not offer a “personalized” slice of the discovery pie, perhaps as a facet, that filters results based on the user’s presumed context? So a patron, logged in to the system, might get results focused on those appropriate to each of the enrolled classes (by level or department, for example). Or to remove one’s own native discipline from the results and focus on results from an entirely different one. That could be a powerful tool to enhance research at the interdisciplinary boundaries of two subject areas.

The power of discovery, in my way of thinking, is not just in harnessing the local and the global — which is something in and of itself — but in providing tailored, focused access to that breadth. It’s not just the Mississippi as it dumps into the Gulf of Mexico; it’s just the right tributaries out of thousands that feed into the torrent.

So I don’t think discovery is doomed, or misguided. But I do believe that the path forward is in more focused, context-aware, services.