ASIS&T 2007: Plenary Session: Clifford Lynch

Will talk about issues Lynch has been thinking about — role of universities and cultural memory institutions in a networked world. How is idea of collection changing in this world?

When confronted with a confusing situation — like today’s information world — in which economics, services have become dysfunctional, it’s useful to go back to first principals. Refers to Ithaca, a Mellon spin-off. Has a research arm They’ve been looking at university publishing in the digital world. What is future of university presses? (It’s ugly.) Their approach — how can we fix the press — not quite right.

Correct question is, what were we trying to do when we created university press, and is the press the right structure for that today. Or, are there different opportunities to achieve those goals?

Presses’ purpose was to disseminate scholarship. Not to be house organs, but to publish for a circle of universities, provide some breadth, arms-length discipline. If that’s the goal, then transactional, book-based model may not fit. We have lots of kinds of scholarship to work with.

Two notes: 1) History of university presses shows (Lynch thinks) that origins are complicated and less noble than you might think — rationale includes procuring reasonably-priced printing services, for example; 2) is communicating scholarship part of fundamental mission of universities? In Netherlands, they have affirmed the latter point firmly; but not clear that’s the case everywhere in U.S. Some institutions feel strongly yes — that role of institution is to disseminate faculty’s work (especially. publicly-funded state universities); others, not so much. itunes U and YouTube broadcasts of classes — a follow-on to support of history of public broadcasting at these state universities.

Others feel that “publishing” belongs in technology transfer office. (Open Source movement in computer science departments conflicts directly with tech transfer, incidentally.)
Libraries in universities are taking on “press-like” functions — dissemination functions.

A big challenge for universities: do universities have fundamental role in stewardship of intellectual research? This is a fundamental role of research library — but without funding (at federal/cultural level). There’s a squeeze; technology increases. Libraries underwrite cost of data storage and preservation, run repositories, etc. Other entities in university do this, too: archives, museums also do this work.

Another problem in terms of resources for stewardship: Broad move to create digital surrogates of rare, unique/inaccessible material. Mostly non-book materials here. Museum tradition is “preserving authentic stuff” are in an interesting position. tension between preserving the real thing and creating surrogates. Ability to create surrogates is getting very good; Lynch says we can create surrogates that are good enough to satisfy a broad cross-section of scholarly, educational, and recreational interests. Mediated viewing allows, for example, 3D views of sculptures (such as Michelangelo’s David) from viewpoints you can’t have as a museum-goer.

You can, of course, duplicate surrogates endlessly and cheaply. Part of good stewardship should involve making those surrogates available broadly — to protect against natural or man-made disasters. So if original, or original surrogate, is lost — record isn’t gone. This is counter to culture of collecting — but world isn’t the same as it was.
Another thing: for art that is repatriated, it should be thoroughly documented and “surrogated”. After all, these works are “centuries out of copyright”. National patrimony — a way to have national digitized record of cultural elements that remain in the private sector at a level that’s good enough for most purposes.

Heads of major research libraries are in a tough place: increasing expenditures for resources for researchers; budgets not kept up. At same time, need huge investments in digitization and — in long term — data curation. There are sources of money for this, but they aren’t plentiful. NSF Datanet, private funding, start-up funding. This is all research and capability-building, not long-term. Lynch says funding will come out of traditional stewardship organizations.

To change gears. Now talking about changing nature of scholarly publication and communication environment. There’s an explosion of rethinking of scholarly work — monographs, journal articles, data are all changing, evolving, becoming more complex. Data curation will be a big issue, not just in sciences but in social sciences and humanities, too. These challenges reach down into small science — in fact, this is where the real challenge is. Big projects generally have good data collection and storage mechanisms. Small projects — especially individual researchers, with no grant money — don’t have those resources (money or staff). The right support structures simply do not exist in most universities. Sometimes there’s a bit in campus IT, sometimes in library, sometimes in departmental informatics groups… But scattershot and rare.
Growth of interest in “virtual organizations.” Fundamental idea is that of “collaboratory.” Researchers and students who want to work on a problem using the same data, the same instrument — want ad hoc groups independent of institutional borders to get together, work, and go apart. Short-term or long-term, as needed. How do we support and curate data from this sort of project, when there’s no there there? Proliferation of NGOs is similar — often virtual organizations with similar demands and requirements.

We are crossing threshold where people are authoring not just for people but for machines. Not just for indexing purposes, but for understanding, at some level, of research. Data needs to be available in forms that can be synthesized. What does this mean? Lots of tagging and microformats for specific data types. Roles of publishers and authors in supplying this markup are unclear. How to attach structured data to article (and by whom?).

Overwhelming issues

1) Entire journal delivery system is not designed to allow text mining — in fact, publishers stop this when they notice. Often contractually prohibited or limited. Some open access sites are text-mining friendly — even zipping entire corpus and making it available. License and delivery mechanisms need updating.

2) Intellectual property issues vastly challenging. Definition (legally) or a derivative work is complex. Does an algorithm generate a derivative work? Legally not, probably. Output of a text summary tool may be a derivative work. Are your PubMed summaries derivative works? We’re running up against a set of new challenges with very high stakes in copyright area.

Google is scanning everything, but in-copyright material is only provided as “snippets.” Fundamental argument is that Google not doing economic damage by providing snippets. Google internally has a comprehensive database of literature which it can computer upon. We cannot know what they’re doing with the results of computing on this database. This is a unique strategic asset. If they can develop text mining tools — what can they do with it? It’s a training set for a range of interesting purposes. Lexical analysis, AI systems… and more. We don’t currently understand how to even talk about these questions.

Summing Up

We see an enormous amount of material produced outside traditional media. And mashups of things in and out of traditional channels. Pools of interesting content in Flickr, YouTube, hosted blogging services, The public don’t really understand these as dissemination mechanisms; they see them as preservation mechanisms. These services are not preservation-oriented. Who fills that role? Who knows.

Problems of doing research are particularly acute in academia: human subjects, institutional review boards, etc. — important roles, but get in way of rapid research. Corporate (Google, Microsoft, etc.) very concerned about individual privacy. Corporate researchers say they couldn’t do their research in academe — could not get through IRBs. Models of how we do research in academe need to be reviewed and updated. This is becoming a serious problem.

Interaction — where will it lead us? Interaction is core of Web 2.0. We tend to trivialize this interaction. Where we need to go… Two sets of things around social tagging. One is language and vocabulary, how people want to describe things is in conflict with traditional stewardship organizations’ methods. Users are often after different things. Other side of tagging is about assigning imprimatur — things a person found interesting. Becomes a rating, of sorts. These are still simple interactions. Key point is that we’re opening up our systems to the public in ways that have never been done before. Depth of description is potentially infinite; actual description often scant (“500 pictures of street life in Manhattan, 1951”). Enables a much wider conversation between cultural items and the audience. We don’t know how to manage it. But the stakes are high: it’s about building collective narrative and history. Revising and revisiting history.

We are noticing that, if we do a good job curating what we have, they want to give more. How do we structure these collections across organizations? We can build virtual collections regardless of what makes sense geographically or organizationally. How do we structure resources (biographies, timelines) to be integratable into other tools.
Copyright remains a huge problem; most of the content that people will interact with was developed in living memory — and therefore in copyright. How do we deal with that?
Validation of authority — a library’s opinion is seen as well-measured and accurate. How do you mediate disagreements between taggers or participants in these interactive worlds? It’s very different from the challenges we’re familiar with in annotating records the way libraries always have.


Q: Google’s document (code, documents, etc.) sharing work well; who owns stuff and what can happen to it while Google has custody of it?
A: General purpose tools to support scholars are important. We need to think more about what those tools should look like. Typically, when you use Google, etc., there’s a license agreement you clicked through. You don’t generally give away your copyright, but are giving limited rights to do things with your content.

Q: What about rights to digital reproductions of cultural works? Current practice gives those rights to the body that owns the physical work.
A: Museums don’t own right to pre-1920 items (disclaimer from Lynch: I’m not a lawyer). They control access — museum sets rules by which an image can be made (tripods, flash, etc.). On a policy basis — we need to start talking about whether museums, as tax-exempt entities holding public cultural items, have right or obligation to distribute these items digitally.

Q: Are there records that should “gray out” after a while? Is there a “statute of limitations” on things like bankruptcies — which vanish after so many years?
A: Sorting this sort of thing out is a huge social problem. Reportage should not be rewritten — there’s a slippery slope. There are public records and public public records (things that exist, but are hard to get; things that are truly public). When legal public records go public on the internet, there’s a conflict. This needs to be sorted out, too — as a social issue. Another question about how much you should be able to revise your own personal history. Facebook, Myspace, and their ilk open up these questions to a tremendous degree. Where’s privacy boundary here?

Q: How can cultural heritage institutions improve training to reflect the issues you’ve brought up?
A: There should be more convergence in education programs — among libraries, archives, museums. Museums, in particular, are often isolated from libraries and archives.

Q: Are there constraints on horizon to funding for these activities — funding for collection digitization has been relatively good until now.
A: There should be more — it’s OK now, but could be better. Demand is still huge. Challenge is to think about priorities for applying the money. Humanities and social sciences should get together and decide on collective priorities for digitization. Should be discipline-driven, not opportunistic.