Main

Category Archives: Conferences

ASIS&T 2007: Who Is Tagging Information?

I'm attending the ASIS&T Conference in Milwaukee and will be blogging selected sessions. Read on for my notes from Who Is Tagging Information

Continue reading "ASIS&T 2007: Who Is Tagging Information?" »

ASIS&T 2007: Information-Related Behaviors

Modeling Task-based Information Behavior on the Web: Application of ISS Schema
Jeonghyun Kim

How do we understand, identify, or predict information seeking behavior (ISB)? Or, how do we analyze, measure, or categorize ISB?

Two classical modes of ISB: A) Classifications of behavior and B) Models of information behavior.

Different ways to describe ISB on the web

Belkin's Information Seeking Strategies. Based on 4 dimensions:


  • method of searching,

  • mode of retrieval,

  • goal of retrieval,

  • and resources considered.

People often change from one strategy to another in the course of research.

What motivates ISB? Need, information problem, problem, goal, task. Researchers use different terms to seek the driving force between ISB. Kim focuses on "task" -- ultimate goal is performing a task. Tasks require information. Satisfaction with search will depend on its assistance toward completing the task. This is not a novel approach; been around in literature since 1980s.

How can tasks be conceptualized?

  • As a process vs. as objective
  • A function
  • Task description as "an abstract construction".

Kim focuses on task as objective.

Tasks have types: topical vs. factual; domain (in some fields -- shopping and travel -- people tend to spend more time exploring site and use directory); complexity.

Putting all these studies together, designed a conceptual framework.

Task Typology

Factual Task: Need to find a specific fact or piece of information.

Interpretive Task: Find general information on a particular topic, with "knowing more" as the goal.

Exploratory Task: You need to learn a lot about a broad subject, without a specific goal in mind.

Method

30 LIS students were in the study. They were observed as they carried out sample tasks. Interviewed at conclusion to see how difficult they felt the search was.

Results

Factual task -- Most users type specific keyword, scroll results, try a page, go back, until they get the info they want. Very target-specific; people tried to find the word/phrase that would answer their question. Some users relied on the search results description, not title. They were in minority.

Interpretive Task -- Goal-focused and selected. Went through Table of Contents and many sections. Scanned for a page on the target site.

Exploratory Task -- Many people just stop when they found a page that had "lots of things"; defined enough as what they had, not by quality, necessarily.

Is there a pattern between strategies? Which ones are users likely to use in sequence? Certain strategies were used frequently by users.

Study shows a path to better building sites and tools to answer users' questions. Information seeking research needs to be extended toward task. Information systems should be structured in ways that support tasks. Understanding probably tasks should lead to better site structure.


Information-Seeking Behaviors of Academic Researchers in the Internet Age: A User Study in the United States, China and Greece

Peiling Wang, Dimitris A. Dervos, Yan Zhang, Lei Wu

Why this study? Scholarly communication and information seeking is of great interest. Academic researchers are serious users of traditional (print) information systems. The internet has changed the research environment.

Two-dimensional framework:


A. Information Seeking Activities

A1 general -- long-term research needs

A2. task-based -- corresponding to project life-cycle)

B. Internet Information Communication Technology/Resources [IICT]
B1. Internet communication tools

B2 Internet-enabled information resources

Questions asked

  1. How do researchers engage in the two types of IS activities in today's digital environment (A1 and A2)
  2. What IICTs do researchers use/not use for IS? (B1 and B2)
  3. Are there any differences in IS activities and the of IICTSs in different countries?

Results

Ended up with 82 respondents as follows: 28 U.S. (computer science, engineering); 19 Greek (higher education), 35 China


IICT usage:

  • Communication: Email, Web, FTP, Listserv used by more than 50% in at least one country (listserv by only 14% in China)
  • Blogs, Wiki, IM -- mentioned by participants, not in original questionnaire. Most CS and Engineering researchers do not use these tools for research.
  • Digital library, ejournal, database, OPAC. All use at least one of the four. The most used is digital library, least is e-journal.

Perceived IICT importance -- web, email, digital library, e-journal were the most important overall. Chinese respondents thought e-journals viewed this as more important than others; value peer-review process. U.S. & Greece -- think are not-peer-reviewed, less trust. Greek participants thought digital library more important than others did; Greece established a national consortium of journals in 1998, called a "digital library".

Other sources -- conferences are an important informal channel of communication and information exchange. In CS and Engineering, some conferences are rated higher than journals in value and impact.

What percent of information need is satisfied digitally? 85% (Greece), 81% (US), 74% (China).

Why aren't IICT's used? Time / information overload; availability, convenience, nature of projects, etc.

Challenges to libraries and librarians: I don't need the library anymore, thanks to the Internet; I only to the library to get coffee; libraries need to change.

Web makes it easier to monitor what's going on. Managing is getting harder. Archiving is underutilized -- institutional repositories are not frequently reported in Greece; China, on the other hand, has a more developed system based on universities.

Managing digital information -- how to organize files into folders, keep multiple categories, or keep copies on multiple computers.

Implications: Active researchers should maintain up-to-date homepages. Librarians and libraries must find new roles -- especially in institutional repository. New digital tools and resources must meet needs of users. Incorporate what users know and how they use info into the tools we provide. Revamp personal bibliographic database tools with new models that incorporate information needs and seeking behaviors.

Toward an Integrated Framework of Information and Communication Behavior: College Students' Information Resources and Media Selection
Soo Young Rich, Brian Hilligoss, Jiyeon Yang


Given many choices of information resources, knowing where to start is hard. What are the consequential and multidimensional aspects of information behavior over longer time period? A series of information activities in everyday life information seeking context.

How do college students select information resource and media differently depending on their information seeking goals and tasks?

To what extent do they rely on online information for their important information problems?

Each day, for ten days, students picked an information problem that was most important to them and answered an online questionnaire. What they needed, why, and most important, what they did to solve the task.

Followed up with interviews, going over each of the daily reports. Asked about the search process, how they rated sites, how they knew they had answers, etc.

Had 245 information seeking episodes from 24 subjects.

Found that in problem solving tasks -- human resources (when the student communicated directly with a known person) 52% of the time. In other information needs, web sites had the most use, followed by print materials.

Students preferred to use multiple resources when resolving tasks. But time ended up being the determining factor. Best information in quickest way. If they think going to a person is most efficient, they'll do it. But they'll go to the place that's most efficient, not necessarily most trusted.

Students prefer computer-mediated communication to face-to-face interaction when they engage in intentional information seeking (even when person is physically proximate or available). Perhaps because there's a record of the transaction? Or because they assume the other person is just as busy and doesn't want to be interrupted?

ASIS&T 2007: Leading by Using Social Networks: Facebook and Second Life

Facebook
Pascal Calarco

Facebook started at Harvard, spread to U.S. higher education through 2006. Limited to higher education. Then opened to the world. Focus on connecting people. 47 million users; adding about 200,000 per day. Hosts applications -- one of distinguishing characteristics of Facebook vs. MySpace.

Who uses in library-land? As of mid October. There's now an ASIS&T group (created by Pascal Calarco).

Observations: Facebook is good at connecting and finding people. Good for sharing information, but Facebook is not good for discussion. Line between personal and professional can be fuzzy. User base's fastest growing segment is in ages 25-44. Facebook applications are growing.

ASIS&T group has 172 members -- of which 36 are paid members. 50 were planning to attend ASIS&T conference. 24 of the 36 ASIS&T members planned to attend the conference. Shows potential to a) expand membership and b) network people. But -- social networking sites wax and wane, so to take advantage, ASIS&T needs to be aware of what the next new thing is going to be. But -- opportunity for low-cost marketing and exposure. Not great for networking/job hunting.

Possible uses... Engage younger people (recent graduates). Create metagroups around topics, for discussion, sharing of knowledge, etc. SIG and chapter outreach tool. Event advertising and promotion (it's free!). Attact new members.

Second Life
Allison Brueckner

Second Life [SL] offers potential for online meeting, collaboration, sharing.

SL is a 3-dimensional virtual world created and designed by its residents. Site is run by Linden Labs. Avatars buy "land" in SL from Linden Labs. Software is free; doing things in SL may not be. You can even buy an avatar. You can buy "linden dollars".

Who's in SL? Total -- about 10 million. 450,000 have logged in during the last 7 days; 600,000 in the last 14 days. Not a lot of people return; the average life span of a resident is about 3 months. 46% of residents are from W. Europe, 30% from North America, 10% from South America, 8% from Asia.

What do you do in SL? Business, collaborate, education, entertainment, gaming, instruction, learning networking. There's an SL knitting group; people get together, knit, and sell their knitting -- all in SL.

What makes people stay? Investment -- people put real dollars. People have real businesses there. Community and partnership building, sense of self (place to explore alternate personas), friendships, instruction and education.

Leadership challenges... Interoperability, platforms (SL doesn't do Vista, yet), bandwidth, time zones, languages, culture, trust, currency (only US dollars and Euros are exchangable into linden dollars).

Leadership opportunities... business expansion, networking, communication & promotion, distance learning, global outreach, cost effectiveness, recruitment.

Average SL user is on 4-6 hours/day. Not for the faint of heart!

SL is way for ASIST to lead by example -- embrace new technologies and adapt them to our needs.

ASIS&T 2007: Live Usability Labs: Open Access Archives and Digital Repositories

A series of live usability tests, with volunteer repositories and testers from the audience.

1: dLIST (University of Arizona)

Not a single-institutional repository; it's a cross-disciplinary cross-library repository. Typical user of dLIST looks for information. We'll test 1) an author search; 2) browse for neural networks; 3) can the tester find usage stats for a specific article.

  1. Author search -- no problem. Found works by specific author. Tester had a hard time finding the "search" button -- it was screens down the advanced search page.
  2. Neural network. Had a hard time finding how to do phrase search. Search fields don't specify a phrase, just keywords in any order or any keywords.
  3. Found the article and abstract/download stats handily.

Users tend to be "browsers" or "searchers" in Paul's experience. Search box says "search titles, abstract, keywords -- but doesn't search authors. They aren't a keyword (other searches showed that indexing sometimes includes authors, but not consistently. Also, on advanced search page -- Paul Marty says "if you need a cancel search button, don't make it bigger than the search button."

Home page is very detailed.

Q: How much prompting of the user in a session?

A: It depends; in a purely exploratory session, you give none; in other cases, if you're less concerned with how a task is completed than if it's completed, you can give more.

2: Illionis Digital Environment for Access to Learning and Scholarship (IDEALS) (University of Illinois at Urbana-Champagne)

IR at UIUC. Concentration is on scholarly research and output at the university. Mostly 'gray literature' and content from departments that are publishing technical reports. Most people find IDEALS content through Google, etc. -- roughly 10 times more access of full-text materials than through the IDEALS search interface.

Task: 1) Upload an article to the IR.

Then a page of legalese. Two volunteers -- a "faculty member" and a "graduate student". Submit an Item -- not called "upload". Then it asks for "choose collection"? What's that? Collections don't match expectations.

With two people, the give and take was very rich -- since people don't think aloud, having a situtation in which conversation is natural helps elicit conversation. Technique is called "constructive interactionism".

3: Minds at UW (University of Wisconsin)

It's a consortial collection -- all 26 libraries in the Wisconsin system. This implementation is almost purely "out of the box" -- DSpace is moving to a new platform soonish. Most uers come from Google direclty to an item page.

Tasks:

  1. You Googled your way to a particular work. You want to find other items by this author.
  2. Look for other examples of Urdu poetry.
  3. Look for other contributions from the same school (UW-Whitewater)

1) An author search (full-DSpace) pulls up many false hits. Browsed by author to get to him, found his works.

2) Look at record, look for subject heading -- but no clickable links.

3) Went to communities list, then found UW-Whitewater, then searched.

ASIS&T 2007: Keynote -- The Impact of Web 2.0

Anthea Stratigos is CEO/Co-founder of Outsell.

Impact of web 2.0 on publishers, libraries, information providers, etc.

We're in business of marketing experiences -- not information. Web 2.0 is happening because of a convergence of individual traits, social and technological forces. Cycle of disruptive technologies: online databases, CD-ROM (1982), web (1991), xml, Web services, RSS (2001), AJAX, Ruby on Rails, REST (2007).

Showed YouTube video: "Did you know? Shift Happens." Web 2.0 is about being global, being "flat". Shows famous "nobody knows you're a dog" cartoon -- web 1.0. Web was static. Not interactive. Now, web 2.0, everybody knows you're a dog -- and your likes, your activities, etc. Web 2.0 is interactive -- anything is a consumable.

Web 2.0 manifests itself as social networks, mashups, user-generated content, community/sharing, networking, crowdsourcing. Communities for any and every slice of life.

Web 2.0 enterprise

Google is class example of this; other enterprises are catching up. Quick, agile, global. "Open"-minded (i.e., IBM & Linden Labs' new avatar standard to enable avatars to move from world to world). Content without containers, play well with others, service-oriented, conversationalist.

Marketing -- new tools enable new research (Facebook, Second Life, panels using cell phones). Notion of physical focus groups diminishing; observing live interactions is rising. Lego is doing product development with power users by showing designs on web and redesigning in response.

Users

31% still struggling with information retrieval -- 31% doesn't generate information users want. Users don't want to pay for stuff. Want free content (60% of time); either free or fee if it serves my needs (36%).

Users wantto receive content by email alerts (85%), blogs (47%), intranet posting 41), podcasts (23%), RSS feeds (21%), videocasting (16%) (Source: Outsell's information markets & users database).

Users are pulling together networks (MySpace, Facebook, LinkedIn, and enterprise networks). Enterprise networks: behind-the-firewall social networks. Visible Path is one such company.


Publshing & Information Provider

Information industry of yesteryear is flat or no growth. Google, Yahoo, Microsoft, AOL are exploding. Leads to new revenue models. Have choices: bundling, licensing, subscription, pay per view, advertising, syndication. Agility is rising -- publishers are reacting faster. Also lots of online ad possibilities (for additional revenue). Publishers facing great pressures.

Innovation areas:

  • Pay per Answer (Gerso Lehrman, Nature Publishing Group, Sermo, Innocentive, Complinet, Corporate Executive Board).
  • Pay per View: O'Reilly (buy a chpater, buy a page, etc.). ScienceDirect Info. Scitopia.
  • Pay for Software and Tools: McGraw Hill Construction, Soucient, Visual Files. Mixing content and software and a particular user set to create a workflow solution.
  • "Freemium" -- free basic services for all; premium paid services for those who want to buy them.

Library Environment

Library technology adoption is not keeping pace with real world. Libraries are slower to react. Denver Public library has a teen space "Zwinky". It's an environment to reach teen-agers where they are.

PennTags -- users get to interact with content. Putting libraries in a mall -- for example, Camden Public Library. in NJ. Libraries with spaces in SecondLife. Libraries of things, not books -- library for designers of various kinds (Material ConneXion). Anyone can use; for pay, you get more access.

What Does It Mean

Some think next web will be more like SecondLife - 3D.

Quotes Yogi Berra: "the future ain't what it used to be."

Our industry is going through what other industries have. A new technology appears, it's disruptive. Established industry must shift. Price pressure, ubiquity, accountability are results. Prices are pushed down, ubiquity increases, accountability -- people expect better results from old system to match new. Result is commoditization. A permanent shift in customer habits.

Odd behaviors occur. LIke products are available. Two customer types emerge (lagger and leading edge). Customer focus emerges (industry pays attention to different kinds of users). Partners become competitors. Competitors become partners. Segments and business models fall apart. Old business models fail, new ones arise.

Move is from product-centric to market-centric. Compete on market needs and differentiation. Google and Yahoo are our Wal-Mart and Target.

Information as enterntainment, entertainment as information: Richard Saul Wurman.

Essential Actions

Become agile. Stay on top of trends, making sure you differentiate your service from "competitors". If we're in business of providing information, we need to be digital marketeers delivering digital experience.

Trendwatching: what is happening in the world. Follow the money -- where consumers spend that's where enterprises go. iPhone, green technologies, consumer spending habits. 2-3 year lag time between consumer web and information web. Think globally.

Q&A

Q: How do we reinstill or earn trust in products

A: Users are somewhat trusting automatically, but are highly aware of potential threats. User sophistication is rising. Increasingly jaded view of authority; but it's going to come full circle.

Q: How have people changed their information seeking?

A: Time spent with information is going up ,but time spent finding it is too. Users starting to recognize value of their time in finding information and are looking for more efficient ways to find. Turning to portals, expert communities, etc., not open web. This should move ratio toward more time spent using information, from where it is now.

Q: What does "semantic web" hold for us?

A: Semantic web is coming. And coming quickly. As are other developments in the web; things will look radically different in a few years.

Q: Talk more about 3D world. Where is this happening?

A: Look at virtual worlds. SecondLife is prime example -- it's a platform for duplicating Earth.

Q: In world where simple technologies (IM, del.icio.us, facebook) are booming, how to complex technologies fit in?

A: 3D will become simpler -- it's the next thing. But whatever it is, it must be simple, it must be viral.

Q: Our generation is developing the current tools baased on our models. They're successful. But what will be designed by the upcoming generation who think and interact with information and each other so differently?

A: It will be fascinating, whatever it is. But can't predict.

Q: Have people really changed that much? Card catalog represents a lot of research and a fit with society.

A: Yes and no. Can't throw out the old or reject the new -- but the old informs our reaction and implementation of the new. History doesn't dictate, but guides and informs.

Q: People want to be paid. What is upcoming for ways to pay each other on the 'net?

A: Copyright is a mess -- payment for use is failing. Technology to monitor content isn't keeping up. We'll see more digital fingerprinting -- where you can track content as it moves around. Google is creating transactional technology -- and they're in a position to provide content and payent mechanisms.

Q: Can you comment on shift taking place in power dynamics between engineers who created technologies and individual users' expression on these technologies.

A: Web empowers people; it's a platform for conversation. Web empowers more people; everyone has a voice. Everyone decides which voice(s) to listen to. Users will need to decide how to structure their online life; we've been empowered, but responsibility roles and mores aren't clear yet.

ASIS&T 2007: Social Computing as Co-Created Experience

Social Computing as Co-created Experience
Karine Barzilai-Nahon

Gatekeeping: information control. Lots of concepts throughout literature, concept dating back to the 1940s. Somewhat fragmented. Ways gatekeeping is defined -- what's the rationale for gatekeeping: protection, preservation of culture/social, linking, facilitator, editoiral, disseminator, change agent, access.

Barzilai-Nahon's definition is Network Gatekeeping. Information control (not in negative sense) but in sense of channeling, facilitating, editing, adding, deleting information. Focus not on the gatekeeper, but on the "gated" -- those whom gatekeepers act on. Four attributes of gated:

P Political power
I Information production
R Relationship -- frequency, duration
A Alternatives -- information society creates more autonomy because we have more alternatives.

Dynamism between gated and gatekeepr.

Research asked two questions: What kinds of message do we delete and why? They found about 8 reasons for deleting messages in forums. Main reason -- if someone hurt the community. Then spam. Then off-topic messages.

Gatekeeping self-regulation mechanisms. For example -- censorship, editorial, channeling, localization mechanisms. There are designated gatekeepers and informal gatekeepers. Designated -- managers in a formal role -- and informal -- community members. All gatekeepers tried to keep homogeneity.

90% of "guest" users were, according to IP addresses, regular registered members who were entering anonymously. People often used guest account to make critical comments they didn't want made under their real persona. These comments often got deleted.

The Four Attributes

Do gatekeepers have politial power in any context, online or otherwise?

Two Takes on Virtual Design: The Construction of Expertise and Embodied Design in Second Life Deisgn Teams
Kalpana Shankar

Collaborative Virtual Environments [CVE] (There, Active Worlds, Second Life). These are defined by collaborative design, mied reality, ecoerce, education, and enterprise. Not stricly games.

What's in a metaverse? Builds -- there's nothing there when the first user goes in. Users build the world around them. Users can occupy space at the same time or at different times. Live chat and leaving a message. Live interaction via avatar.

Research questions

1. How does virtual collaboration affect and influence deisgn activities in Second Life?
2. How does the designer's experience of embodiment shape emergent design practices?
3. How is design executed?
4. How does SL design become integrated in real world design?

Methodology. Learned about SL through interviews and SL's "sandbox" -- a place to learn the space. Then recruited two design teams to observe and interview. Interviewed in SL, via chat.

Then, once understood what they wanted to observe, did ethnographic observations -- watched teams work, gathered chat logs, conducted follow-up interviews.

Embodiment: The bodily aspects of human subjectivity: the human body's physical presence. There's also the "experience of physicality" -- how users see themselves and present themselves to others.

Presence: Each user has a unique graphical representation. There are rules -- you can't walk through walls, you can't be invisible. Artifacts are similar. They can be given and received. Like in physical world.

Awareness: Understanding viewpoints and attention of team members is crtical in collaborative design activities. Gestures to point at things, verbal "over there"

Location: Space vs. place. SL users create spaces conducive to the activity they're doing. Even though there's no "need" for it in SL, people create elaborate spaces in SL.

View Manipulation: You can see youself on the screen, but can also look in other places. Avatar is not eyes.

Conclusions

Technical infrastructure and the notion of presence. People create a space in wich to work, and then build team identity. This requires management and knowledge of whom you are working with. Lots of uncertainty because you don't know about the avatars the way you would abe real-world people.


Q&A

Q: How does perspective (first-person vs. over-the-shoulder) change interaction?
A: Some research done, but not much. But perspective needs a lot more work.

Q: What are benefits of social capital in online communities?
A: Social capital serves the individual -- you get listened to more. For example, eBay's new ad campaign ("Shope Victoriously") -- idea that it's better to compete than to cooperate. Virtual social capital is not the same as real-world. But it does aid connectedness.

Q: What studies have there been to compare avatars with real world person, and why they choose the avatar?
A: Not really. Avatars are fairly limited -- out-of-the-box you can't change things a lot (with programming, you can).

ASIS&T 2007: Research Directions in Social Network Websites

Where's My Fieldsite?
Danah Boyd

Looked at high school students out of school hours. Make sense of what teenagers are doing by looking at snips of their lives. Answer questions, what are the publics in which we live?

Public and private are different for teenagers than for adults. Children have geographically constrained lives. Culture of fear -- you might be hurt outside of home. No social spaces outside of home. Commercial spaces are increasingly constrained.

So what do teenagers do? They go online. Cause and effect are reversed from popular conception: children don't hang out online because they want to, necessarily; they do because it's the only option.

Networked publics -- spaces or collections of people that exist within and through mediating tools that network people. Has 4 properties:

1. Persistence -- things stick around.
2. Searchability -- you can find things -- including your kids. Everyone is searchable. Problem is that you don't want to be searchable by anyone; you don't want to be found the wrong person.
3. Replicability -- conversations can move from forum to forum. You can edit things and repost. What's original?
4. Invisible audiences. You don't get feedback from those with whom you're addressing. In real world, speaker knows to whom she is speaking. We address our talk to that context. Not so in networked public.

What are social norms online? They are different, and evolving.

ONline concept of friends -- putting audience into being. Defining to who you are speaking when you post. "Public by default, private when necessary."

Teens' idea of privacy is that they can control the audience, or have semblance of control They do this 3 ways:
1) structural walls -- they put up info that hides them.
2) social demand -- create a space that's mine, not yours.
3) playing ostrich -- if I don't see you, you don't exist.

Public life is changing. Mediated and offline are growing together. Conversations have fluidity -- they occur across media. Public life is incorporating all of this -- online and offline -- into something new.


Information diffusion and users' behavior in Fotologs
Raquel Recuero

Based on Fotolog users in Brazil. A two-year study. 20% of Brazil's population has online access; social networking sites are very popular (more profiles than online people).

Fotolog is a simple site. People make fotologs about tons of topics. It's been extended by its users.

Identity appropriation -- create an identity. People select images and text carefully -- lots of thought goes into it. Pictures are carefully photoshopped; perception of self is important.

Social interaction appropriation -- most important thing in Fotolog. Comments are critical -- interaction with Fotolog is important to users. Unique fotolog nickname is important. Groups emerge and conversations take place across groups.

Fotolog is an information tool. Decide what to publish based on perceived gain of doing so. Value is related to social capital. Users think carefully about what info they will put on fotolog -- value based on interaction.

Information that creates social interaction spreads within a group before it spreads across the network. Spreads among people who are closely bound. Perceived value is to make people closer to you.

Perceived value of information is what defines what information will be disseminated.

Activism and Social Network Sites
Alla Zollers

Activism: an intentional action to bring about change. Emphasis on change.

May Day protest 2006; students used MySpace to organize walk-outs.

Social network sites consist mainly of weak ties.

Studied 100 Facebook groups (Politics and Beliefs & Causes) and 100 MySpace groups (government and politics). Content analysis.

Does participation in online groups lead to offline action? Unclear. But there is discussion. Does the architecture of the site effect activist activities? Do people interested in activism go to a site because of the site, or because their friends are already there?

Analysis of Online Social Networks
Fred Stutzman

Research focus on: 1) privacy; 2) dynamics (how systems grow, how friend patterns change); 3) context (how networks answer situationally relevant needs); 4) affordances (what social networks offer to friend-seekers).

Analyzed network characteristics, connections in the network, status in service, privacy, consent, terms of service.

What to think about when doing this sort of large-scale data collection (in Facebook, in particular)? In Facebook, an out-of-network person has different ability to see others' information than an in-network person. Faculty see less than students. Anonymizing profiles to protect student privacy. What about consent? IRBs do not have a good way to deal with getting consent from users. Dealing with terms of service of the site. Facebook granted exceptions until 2006.

Built a Facebook application, "Your True Self". Analyzes your friends' profiles, shows friends who share similar taste. A way to gather information about users via the Facebook Platform.

Question: what does a "friend" represent? In real world, "friend" is on a continuum; in Facebook, it's binary. But hard to know what it means.

Q&A

Q: What governs parental access to MySpace?
A (Boyd): Lots of things. Some kids want parents there, others don't. Privacy rules by service make a difference. Differences in privacy concepts based on race and class, as well; different concepts of privacy and of utility of tool.

Q: How does online community effect the decline of "belonging" that we see in F2F world?
A (Zollers): THere is interaction and debate in online world; this might translate into further, real-world, action.
A (Boyd): Lack of agency means lack of political engagement. Teenagers don't have access to meaningful public spaces; so they feel withdrawn and excluded, so don't participate.

Q: Are there any qualitative research methods to use?
A (Boyd): It depends on the question you're asking.

Q: Did Stutzman's analysis take into account kinds of schools?
A (Stutzman): Yes; it covered a wide range of schools.

ASIS&T 2007: Opening Science to All: Implications of Blogs and Wikis for Social and Scholarly Scientific Communication

Bora Zivkovic

What are sci/tech bloggers doing?

Fun stuff... Changing policy. Scientists are not humorless automatons. A way for "fun" to appear within scientific literature. Science and art, history of science. Blogging from the field -- talking about field research.

Serious stuff... Snippets of research too "small" to be published, but valuable. Sometimes hypotheses and data -- open notebook science (in a later talk). Blog carnivals -- ad hoc popular journalism. One editor collects posts sent in by others, posts link list in a single place. Editorship rotates among group.

Popular magazine editors; some have blogs. Serious publishers do, too.

Blogs are starting to be locus of open access publishing and review -- reviewers don't comment on quality of paper, per se; rather, on value of information being added -- is it worth publishing? Trackbacks can allow one to see who else in the community is commenting on a paper. Scientists who are bloggers write comments in a few lines: short, blunt. Non-blogging scientists write paragraphs with references; very polite and subtle. A clash of cultures.

Impact of open discussion on research will be immense.


UsefulChem: An Open Notebook Science Project
Jean-Claude Bradley

Jean-Claude coined phrase "Open Notebook Science".

Speaker runs a chem lab at Drexel; manages student researchers. Talk is about how they share their research.

Talk

There is a continuum from closed to open in how science is reported:

  1. Closed research: Model is the traditional lab notebook -- unpublished, fundamentally personal. Failed experiments are never seen by anyone.
  2. Traditional journal article: Mostly open; but you need a subscription to journal. Not as convenient.
  3. Open Access Journal: Available to anyone online. Some journals require authors to pay to be published.
  4. Open Notebook Science: full transparency. Everything that's done is recorded and available.

Where is science headed? we are between human-human communications and human-computer communication. Research is moving in direction where computers start to manage research -- plan experiments. It will be a self-organizing redundant projects. Critical factor: being able to read and write (publish) with zero cost. Publication of all aspects of the scientific process: open notebook science. Total transparency.

If machines "do" science, how do they know what's important? Ask humans. In other words, search texts for things like "next steps", "what's next" and answer those questions.

Malaria is a good venue for this: big problem, no big money for drug companies.

Started out blogging things... Moved to wiki because wikis are better at organizing things. Wiki enabled broad discussion. The successes, and importantly the failures. Also, blogs don't have record of changes. Wiki enables the history to be preserved. Result is UsefulChem.

Things are indexed in Google, time-stamped, findable. History of editing is available to all.

How do people find experiments? Free tool, site meter, shows how people are finding the wiki. Some via RSS, some via searches (mostly Google). Molecules are tagged in wiki using InChI. Google handles these pretty well -- so a good tool for researchers to use. And of course, raw data are available for every experiment.

They are still using a blog, but using it do point to things in the wiki, define problems. Blog is targeted toward other chemists, not public.

Open Science lets you connect with people at other institutions and collaborate -- you find each other in the course of your individual research. Interestingly, mailing list is still tool for intra-group collaboration than either wiki or blog. Also using Second Life to hold meetings.

Q&A

Q: How do you achieve institutional buy-in for open science? Many scientists/researchers/academics are not good at sharing
A: Need to find people who share the vision and lead by example. Growth of open notebook science is going to be slow. Impact will be big, though, over time.

Q: How easily are graphics handled in wiki software?
A: There's a free Java viewer for images -- to do "zooming", etc. -- so there's no burden on user. It's just there, part of the open source movement.

Social and Scientific Implications of Science Blogging
Janet Stemwedel

Interested in philosophy of science and ethics of science. Blogs at Adventures in Ethics and Science.

Talk

Scientific communication is essential to scientific practice: to share results (with public, with each other), to articulate theories, to train new scientists.

Traditional channels of communication are peer-reviewed literature (this is how "score is kept"). Tenure, promotion, existence as a researcher all tied up in peer-reviewed process. Peer-reviewed literature is a back-and-forth between scientists over a long time scale. Research tends to be secretive until [eventually] published. Peer reviewers are necessarily your "competitors" -- experts in your narrow field.

Also conferences -- shorter timescale. Informal conversations and discussions. These tend to be ephemeral; thoughts vanish after being uttered, and those not at the conference don't take part.

Press releases, popular publications, etc. -- these tend to be one way, from scientist to public. Science journalists end up being gatekeepers.

Problem is the knowledge-building requires good communication. Only way to get to objective knowledge is by having many people comparing results and interpretations. Interdisciplinary tools and approaches are key. Challenge is avoid duplication and avoid already-discovered dead ends.

So what's wrong with traditional channels of communication? Most communication comes at end of project, not in midst. Not much collaboration or input. What's reported reflects author, reviewers, and journal editor. Not broad community. Vast amount of information is not reported, especially things that don't work.

Blogs hold promise to improve this. Offer back-and-forth on short timescale. Less ephemeral. Potential to expand audience broadly across geography, disciplines, backgrounds. Blogs may be free of existing pitfalls of peer-review (inherent conservatism in process). Quality control is interesting; posts are viewed and commented on more broadly. Through discussions on blogs, we get a window into science as process, not result. This is important to scientists, as well as to public.

How does community of science function? Blogs can open up this community a bit to scientists. Scientists are loathe to discuss process by which they communicate. The community is opaque from the outside. (And from the inside.) Blogs can help expose this to those thinking of entering the field. You can have a virtual community in place of the real one that may not exist where a person is. Opportunity to change mode of community conversation.

Your audience becomes the audience of the willing. Do you blog as yourself or anonymously? If yourself, there's risk; if anonymously, people don't know who you are.

Can blogs shift the culture of science? Now, see things as competition for scarce resources. Blogs could help make mentoring be taken more seriously. Expand audience to the non-scientists. Ongoing discussions will review that science is a process, not a result.


Q&A

Q: What are risks to intellectual property in open science?
A: Large -- if you're interested in a patent or IP, open science isn't right for you.

Q: How will wikis change university?
A: When people who have tenure feel the current process does not work anymore. It will be slow and evolutionary.

Q: What is key research question that you think is important to investigate (in terms of how to use blogs/wikis to support science)?
A (Stemwedel): How do scientists learn to be good scientists? How is that changing?
A (Bradley): Study how science gets done through Open Notebooks -- see how people change minds, react to data, etc. Interesting to see how other scientists "do" science.
A (Zivkovic): Blog is software, not way of thinking. What you do with it is what is important. Publication of paper is not end; it has a life after publication, and that life is now public and observable. A second stage of peer review.

Q: How do electronic lab notebooks (aimed to decrease "cheating" in science) interact with open science.
A (Bradley): Having a wiki enables me to mentor students, via wiki, several times a day. Also opens mentoring to anyone.
A (Stemwedel): Electronic notebooks are scary because disks can get destroyed -- centralized online storage is safer in the long term.

Q: How do we view authority in "science 2.0"?
A (Zivkovic): Nothing new; authority is built over time. Some blogs will be "citable". We will figure this out. Comments on Public Library of Science get DOIs -- the comments are citable. Idea of "citable unit" will change.
A (Bradley): Blog posts can go to Nature Proceedings; no peer review, but editorial review. And there's a DOI, too. Be sure to keep copyright if you want to do this.
A (Stemwedel): People are using authority of reviewer as a substitute for quality of reviewer.

Q: How do you know with whom to collaborate?
A (Bradley): I'll work with anyone with something to contribute. Can't rely on traditional authority; rely on actions.
A (Stemwedel): Interactions within scientific community, not narrow research. Blogging can be a powerful support tool for researchers.
A (Zivkovic): Open access science is critical to globalization of science. Helps reduce data privilege, especially outside developed world.

ASIS&T 2007: Social Computing, Folksonomies, and Image Tagging: Reports from the Research Front

User Supplied Image Category Labels
Hemalata Iyer

Study's goals were to identify underlying structure of image tags. Analyzed 105 participants' labeling of 100 images. Images tagged and organized into groups. Identify a prototype image in each group. Identify significant feature of prototype image.

Example of hierarchy: furniture (superordinate), chair (basic level), kitchen chair (subordinate). The basic level has more distinctive properties than superordinate, but isn't too specific.

Out of the 899 category labels applied, ~58% were superordinate, ~38% were basic level, and ~4% were subordinate. Interesting -- it was thought that basic level would be most common.

A group of people displaying emotional behavior was grouped as "emotions"; facial behavior was prototype. Categories can be built around prototypes; for any category there is likely to be a single prototype. Familiarity, culture, environment effect selection of prototype.

Superordinate terms and significant features of prototype image are important in indexing. Retrieval and browsing: grouping facilitates browsing.

Social tagging: group labels tend to be superordinate. Individual images in that group tend to be tagged non-hierarchic related terms. Associations, not hierarchy. There is not much structure (does this matter? unclear). First tagger influences subequent taggers. Perhaps first tag should be done by an expert, to subtly guide future taggers.


PhotojournalsmAndUADs geotagged:ASSISST2007MilwukeWi topresent
Diane Neal

Yes, title is intentional.

Needs of photojournalists are different from other photographers in terms of tagging.

Photojournalists select what to photograph and to store their photos in their publication's photo archives. Photo editors pick photos to go with stories. Also worked with photo librarians.

Where is the locus of control -- internal it's something you can control; external -- blame on something outside, beyond, you. We like to have control over our pictures (they're something we save in a disaster, we like to have them).

Photojournalists and editors were studied:

People found named objects, specific events, browsing, user-assigned descriptors (UAD), metadata as the most important. Descriptors, in general, were most important kinds of labels. Started with a keyword, moved to browsing. Like metadata-based searching.

Problems with people doing tagging -- inaccuracy, errors, typos, lack of time. Need to formalize rules for tagging (somehow). tag guidelines (ie., no plurals, no compound words, etc.).

Presentation
Abebe Rorissa

In classic info retrieval, a document representation (surrogate for document) is matched with a user query (surrogate for information need). In new world... We have huge multimedia digital librareis; not single items, but collections. Many things are not text, they are multimedia. Retrieval systems more complex to match queries and document representations. Now we're looking at slices of information space, not documents.

User is creator, annotator, indexer, searcher, and consumer of content - all roles formerly done by authros and professional indexers. Users have their own language, not the controlled vocabulary. Rise of tags and folksonomies, not controlled vocabs.

Challenges

Users' roles change, often in mid-research. They have simultaneous multiple roles. We have to react to individuals and groups of users. MNeed a more complex information retrieval model. We have "a million typing monkeys". We have to deal with free and uncontrolled sers' langauge and vocabulary.

Opportunities

The million typing monkeys are also an opportunity. Users are wiling to contribute descriptions of ocntent. Rich data to study tagging behavior (great for researchers). Need to find ways to let user tagging inform our retrieval systems.

What Next?

Probably no single model will capture whole information environment. Browsing is important feature of IR. Revise Ranganathan's second law: Every user his/her overview of the document collection". Still need way to get to single document.

Two tools to look at:
Flamenco
PhotoMesa

How do you provide access? People tag at a high level -- broad terms. Best entry level in a browsing interface should be the basic level; where people search. Depth of hierarchy is a problem. Hard to display breadth of terms in a functional way.

Social tagging is an opportunity, not a challenge.


Semantics of User-Supplied Tags
JungWon Yoon

Wide gap between terms used by taggers and terms used by professional indexers. There is not a thesaurus to get from one to the other -- at least, none now.

Generic terms are most frequently used terms. 75% of generic terms are in formal index (LC TGM). Studied occurrence of colors as tags in Flickr and in LC TGM.

What are relationships that are most useful for users?

Tags of specific location were frequenlty used in Flicr. TGM doesn't include specific geographic locations. But related tags don't follow regular patterns.

ASIS&T 2007: Next-Generation Catalog: Prototypes and Prospects

OCLC
Chip Nilges

Nilges is VP of Business Development at OCLC. Currently working on WorldCat Local.

People view libraries favorably as source of great information (from Perceptions report). Report identifies a problem: where do you start your search? 84% say search engine; 2% started at a library site. There is a huge gap there.

How do libraries deliver value (collections, services, and community) to the user, on the network, at the point of need? This is what OCLC is trying to solve.

OCLC strategy to weave libraries into the web. Open WorldCat, WorldCat.org, WorldCat local came out of this strategic goal.

Open WorldCat a syndication project. Puts OCLC catalog records into Google, Yahoo, etc. Get data where it's being searched. Predictable URLs, machine interfaces. Hooked in to Google Scholar, for example.

WorldCat.org -- a way to search the catalog. "Give away" worldcat data. Launched about a year ago; use of WorldCat overall has tripled in 3 years.

Things under development recently:
Personal profiles, citations (in various standard forms);
List creation/management/sharing, expanded metadata coverage to better expose collections of interest to users;
Personalization -- features being developed now.

OCLC wants to get into job of citation management -- moving in that direction.

OCLC measuring traffic. in 2006/7, and 129.4 million referrals from partner sites to Open WorldCat landing page. 7.6 million clickthroughs from Open WorldCat to library services -- this is huge.

WorldCat Local: Not in original plan to release a next-generation catalog. But from library demand, it came about. OCLC "doesn't do portals" -- it's just a search box. Service is centrally-hosted, customized view and search algorithm. A library gets a search box and a custom URL. Standard search algorithm is 'tweaked' to present local items first. Local holdings displayed in record.

OCLC learning it's a different thing to design for librarians than for customers. Learning a lot about customers.

What's searched in WCL? WorldCat, metadata of 33 million articles, local repositories as indexed in WorldCat. Object is to bring in good enough data from OCLC sources that libraries can replace their federated search engine. Also indexing local repositories.

WorldCat Local fulfillment requirements: interoperate with local management systems and with local delivery services. Pilot partners: University of Washington, Peninsula Library System, State of Illinois libraries, Ohio State University (12/2007), University of California System Melvyl pilot (spring 2008).

Upcoming features:
Institution search
Identities integration (http://orlabs.oclc.org/identities)

Big challenge for OCLC -- balancing local needs with global needs; local record vs. master record. User wants continuity, systems don't provide it.

There may be an OpenURL resolver on the way; some clients are asking for it.

Q: Is inclusion of Open Access journals considered?
A: Yes -- open access books, archival materials, ejournals. Lots coming over next two years.

NGC: Next Generation Catalog
Andrew Pace

Our patrons are already "next generation"; it's our systems that aren't. Quick demo of Endeca -- faceted browsing, shelf browsing, etc. Why do Endeca? Unresponsive vendors; early experiments in NGC; casual conversation with Endeca; formal conversation with Endeca (2/2005-6/2005; fast implementation (7/2005-1/2006).

What's the big picture? Improve quality of catalog, exploit data already in the catalog. Build a more flexible catalog tool that can be integrated with future tools not yet invented.

Why do Endeca? Facets were a nice byproduct, but relevance ranking was the target. There's little in the literature about relevance ranking for bibliographic surrogates. Improved response time enhanced natural language searching, and true browsing. Automatic word stemming (for certain words).

Sits on top of library catalog system. Daily data load from catalog. Used to improve the discovery process.

Data and analysis

From July 06 to Jan 07... 67% of users do search. 20% do browse. 8% do pure navigation (through LCSH headings).

26% of navigation is by subject topics -- people are refining their searches by subject.

See Lown & Hemminger (2007) for a detailed transaction log display.

The "revolutionary war" problem. A search in catalog gives you LCSH subject headings. U.S. revolution gets 10 pages of subject records. In Endeca, working on this. Do you get the top n subjects in browse?

Expanding scope to 10million records in the Research Triangle libraries.

Emily Lynema and Tito Sierra -- a web service on Endeca that allows access to the catalog. Yields RSS new book feeds. Enables mobile device searching. New books wall w/jacket images. Resource lists for embedding in other web pages with web services.


Q&A

Q: Students when faced with too many options don't learn the best way to do something.
A: It's more important that they get what they want at the destination; entry path not so important.

Q: Endeca is "next generation OPAC"; what about next-generation catalog -- describing information?
A: NCSU hasn't done anything yet to change its cataloging practices; what they've done is exposed all that work so that it is accessible to users.

eXtensible Catalog
Judy Briden

The eXtensible Catalog (XC) is a project to design and build a system that provides libraries an alternative way to reveal library collection. Integrate library content into other systems. It will be open source and collaborative. Customizable locally.

XC will have a UI with faceted browsing. Locally customizable without significant programming skills. Interface customizable. Multiple metadata schemas (MARC, DC, etc.). Informed by user research.

Two phases to project.

1) One-year grant to write a plan. Completed in summer 2007. Proof of concept prototype, C4, that displays the basic UI that will be bundled with XC. Uses Lucene as search engine. Interesting feature.. from articles search, clicking a link (generated from MetaLib), rather than getting the OpenURL screen, user is directed straight to the full text.

2) Just funded -- starting the project.


XC can be used as a new interface to an existing single repository -- or integrate multiple repositories (at the interface level).

XC will address the needs of many libraries and be flexible, extensible -- anyone can contribute.

Q&A

Q: What open source license will XC be released under?
A: GPL.


Next Generation Catalog: the Minnesota Report
Janet Arth

In March 2006, Ex Libris demoed Primo prototype to UMn and others. They were looking for development partners. UMn became one of those partners. Bibliographic data are extracted from catalog and put into Primo.

Usability was in the contract between UMn and Ex Libris. Minnesota did studies. They have access to an amazing usability lab at Minnesota.

Three usability rounds.

  1. First used proof-of-concept version (completely canned search results).
  2. Second used demo site with live, but anonymized, data.
  3. Third used live test site.

Most users actually use drop-down boxes to narrow their search (item type, with/without keywords, location) -- very few typed word and hit search without narrowing it.

In usability debriefing, asked about tags (a part of Primo). Users saw tags as way that future users could see what past users had thought. None thought they would use tags. Few in study actually used tags. Useful as a discovery tool -- way to expand search. But not strong support for tagging. Almost universally viewed as something others would use, not selves.


Q&A

Q: Are you happy with Primo?
A (Arth): Mostly yes; but realistically, we didn't have money to explore other tools the same way.

Q: Has University of Washington looked at how many people are using WorldCat Local vs. the native catalog?
A (Nilges): Not sure what 'take rate' is.

Q: Is there a web service interface to WorldCat local?
A (Nilges): ISBN, yes -- but not extensive yet. Coming soon.

Q: Preference for WorldCat local vs. native catalog?
A: In academic libraries, tendency toward WorldCat Local. In publics, the other way. Perhaps this reflects a difference between what's generally a system of libraries (academic) vs. a single library (public)?

Q: To what extent have we "bridged the gap" with these projects? Are we doing enough to get people to start their search at the library, or is this not even a goal?
A (Briden): Our content needs to be where students are doing their work; we can't change their behaviors. Library fits in their thinking, it's just not the first thing. It should be *one* of the first things, though.
A (Nilges): Ditto. Need to build interfaces that allow your services to be everywhere.
A (Pace): We need to avoid self-fulfilling prophecy. We need to make our catalogs useful, entertaining, helpful -- so when people do get there, they like the experience and find it of benefit. Make catalog "sticky".

Q: Does the underlying catalog data need to change to continue making improvements?
A (Arth): We have good data. Challenges lie in merging it.
A (Nilges): Separating inventory management and finding; pulling other data in with the cataloging. Not clear to what extent the data need to be unified; perhaps only connected.
A (Briden): Opportunity to bring tags into collaboration with subject headings; use tags synergistically. Catalogers have opportunity to work with user-generated data. Pull it together in ways that will make more sense.

ASIS&T 2007: Plenary Session: Clifford Lynch

Will talk about issues Lynch has been thinking about -- role of universities and cultural memory institutions in a networked world. How is idea of collection changing in this world?

When confronted with a confusing situation -- like today's information world -- in which economics, services have become dysfunctional, it's useful to go back to first principals. Refers to Ithaca, a Mellon spin-off. Has a research arm They've been looking at university publishing in the digital world. What is future of university presses? (It's ugly.) Their approach -- how can we fix the press -- not quite right.

Correct question is, what were we trying to do when we created university press, and is the press the right structure for that today. Or, are there different opportunities to achieve those goals?

Presses' purpose was to disseminate scholarship. Not to be house organs, but to publish for a circle of universities, provide some breadth, arms-length discipline. If that's the goal, then transactional, book-based model may not fit. We have lots of kinds of scholarship to work with.

Two notes: 1) History of university presses shows (Lynch thinks) that origins are complicated and less noble than you might think -- rationale includes procuring reasonably-priced printing services, for example; 2) is communicating scholarship part of fundamental mission of universities? In Netherlands, they have affirmed the latter point firmly; but not clear that's the case everywhere in U.S. Some institutions feel strongly yes -- that role of institution is to disseminate faculty's work (especially. publicly-funded state universities); others, not so much. itunes U and YouTube broadcasts of classes -- a follow-on to support of history of public broadcasting at these state universities.

Others feel that "publishing" belongs in technology transfer office. (Open Source movement in computer science departments conflicts directly with tech transfer, incidentally.)

Libraries in universities are taking on "press-like" functions -- dissemination functions.

A big challenge for universities: do universities have fundamental role in stewardship of intellectual research? This is a fundamental role of research library -- but without funding (at federal/cultural level). There's a squeeze; technology increases. Libraries underwrite cost of data storage and preservation, run repositories, etc. Other entities in university do this, too: archives, museums also do this work.


Another problem in terms of resources for stewardship: Broad move to create digital surrogates of rare, unique/inaccessible material. Mostly non-book materials here. Museum tradition is "preserving authentic stuff" are in an interesting position. tension between preserving the real thing and creating surrogates. Ability to create surrogates is getting very good; Lynch says we can create surrogates that are good enough to satisfy a broad cross-section of scholarly, educational, and recreational interests. Mediated viewing allows, for example, 3D views of sculptures (such as Michelangelo's David) from viewpoints you can't have as a museum-goer.

You can, of course, duplicate surrogates endlessly and cheaply. Part of good stewardship should involve making those surrogates available broadly -- to protect against natural or man-made disasters. So if original, or original surrogate, is lost -- record isn't gone. This is counter to culture of collecting -- but world isn't the same as it was.

Another thing: for art that is repatriated, it should be thoroughly documented and "surrogated". After all, these works are "centuries out of copyright". National patrimony -- a way to have national digitized record of cultural elements that remain in the private sector at a level that's good enough for most purposes.

Heads of major research libraries are in a tough place: increasing expenditures for resources for researchers; budgets not kept up. At same time, need huge investments in digitization and -- in long term -- data curation. There are sources of money for this, but they aren't plentiful. NSF Datanet, private funding, start-up funding. This is all research and capability-building, not long-term. Lynch says funding will come out of traditional stewardship organizations.

To change gears. Now talking about changing nature of scholarly publication and communication environment. There's an explosion of rethinking of scholarly work -- monographs, journal articles, data are all changing, evolving, becoming more complex. Data curation will be a big issue, not just in sciences but in social sciences and humanities, too. These challenges reach down into small science -- in fact, this is where the real challenge is. Big projects generally have good data collection and storage mechanisms. Small projects -- especially individual researchers, with no grant money -- don't have those resources (money or staff). The right support structures simply do not exist in most universities. Sometimes there's a bit in campus IT, sometimes in library, sometimes in departmental informatics groups... But scattershot and rare.

Growth of interest in "virtual organizations." Fundamental idea is that of "collaboratory." Researchers and students who want to work on a problem using the same data, the same instrument -- want ad hoc groups independent of institutional borders to get together, work, and go apart. Short-term or long-term, as needed. How do we support and curate data from this sort of project, when there's no there there? Proliferation of NGOs is similar -- often virtual organizations with similar demands and requirements.

We are crossing threshold where people are authoring not just for people but for machines. Not just for indexing purposes, but for understanding, at some level, of research. Data needs to be available in forms that can be synthesized. What does this mean? Lots of tagging and microformats for specific data types. Roles of publishers and authors in supplying this markup are unclear. How to attach structured data to article (and by whom?).

Overwhelming issues

1) Entire journal delivery system is not designed to allow text mining -- in fact, publishers stop this when they notice. Often contractually prohibited or limited. Some open access sites are text-mining friendly -- even zipping entire corpus and making it available. License and delivery mechanisms need updating.

2) Intellectual property issues vastly challenging. Definition (legally) or a derivative work is complex. Does an algorithm generate a derivative work? Legally not, probably. Output of a text summary tool may be a derivative work. Are your PubMed summaries derivative works? We're running up against a set of new challenges with very high stakes in copyright area.

Google is scanning everything, but in-copyright material is only provided as "snippets." Fundamental argument is that Google not doing economic damage by providing snippets. Google internally has a comprehensive database of literature which it can computer upon. We cannot know what they're doing with the results of computing on this database. This is a unique strategic asset. If they can develop text mining tools -- what can they do with it? It's a training set for a range of interesting purposes. Lexical analysis, AI systems... and more. We don't currently understand how to even talk about these questions.

Summing Up

We see an enormous amount of material produced outside traditional media. And mashups of things in and out of traditional channels. Pools of interesting content in Flickr, YouTube, hosted blogging services, The public don't really understand these as dissemination mechanisms; they see them as preservation mechanisms. These services are not preservation-oriented. Who fills that role? Who knows.

Problems of doing research are particularly acute in academia: human subjects, institutional review boards, etc. -- important roles, but get in way of rapid research. Corporate (Google, Microsoft, etc.) very concerned about individual privacy. Corporate researchers say they couldn't do their research in academe -- could not get through IRBs. Models of how we do research in academe need to be reviewed and updated. This is becoming a serious problem.

Interaction -- where will it lead us? Interaction is core of Web 2.0. We tend to trivialize this interaction. Where we need to go... Two sets of things around social tagging. One is language and vocabulary, how people want to describe things is in conflict with traditional stewardship organizations' methods. Users are often after different things. Other side of tagging is about assigning imprimatur -- things a person found interesting. Becomes a rating, of sorts. These are still simple interactions. Key point is that we're opening up our systems to the public in ways that have never been done before. Depth of description is potentially infinite; actual description often scant ("500 pictures of street life in Manhattan, 1951"). Enables a much wider conversation between cultural items and the audience. We don't know how to manage it. But the stakes are high: it's about building collective narrative and history. Revising and revisiting history.

We are noticing that, if we do a good job curating what we have, they want to give more. How do we structure these collections across organizations? We can build virtual collections regardless of what makes sense geographically or organizationally. How do we structure resources (biographies, timelines) to be integratable into other tools.

Copyright remains a huge problem; most of the content that people will interact with was developed in living memory -- and therefore in copyright. How do we deal with that?

Validation of authority -- a library's opinion is seen as well-measured and accurate. How do you mediate disagreements between taggers or participants in these interactive worlds? It's very different from the challenges we're familiar with in annotating records the way libraries always have.

Q&A

Q: Google's document (code, documents, etc.) sharing work well; who owns stuff and what can happen to it while Google has custody of it?
A: General purpose tools to support scholars are important. We need to think more about what those tools should look like. Typically, when you use Google, etc., there's a license agreement you clicked through. You don't generally give away your copyright, but are giving limited rights to do things with your content.

Q: What about rights to digital reproductions of cultural works? Current practice gives those rights to the body that owns the physical work.
A: Museums don't own right to pre-1920 items (disclaimer from Lynch: I'm not a lawyer). They control access -- museum sets rules by which an image can be made (tripods, flash, etc.). On a policy basis -- we need to start talking about whether museums, as tax-exempt entities holding public cultural items, have right or obligation to distribute these items digitally.

Q: Are there records that should "gray out" after a while? Is there a "statute of limitations" on things like bankruptcies -- which vanish after so many years?
A: Sorting this sort of thing out is a huge social problem. Reportage should not be rewritten -- there's a slippery slope. There are public records and public public records (things that exist, but are hard to get; things that are truly public). When legal public records go public on the internet, there's a conflict. This needs to be sorted out, too -- as a social issue. Another question about how much you should be able to revise your own personal history. Facebook, Myspace, and their ilk open up these questions to a tremendous degree. Where's privacy boundary here?

Q: How can cultural heritage institutions improve training to reflect the issues you've brought up?
A: There should be more convergence in education programs -- among libraries, archives, museums. Museums, in particular, are often isolated from libraries and archives.

Q: Are there constraints on horizon to funding for these activities -- funding for collection digitization has been relatively good until now.
A: There should be more -- it's OK now, but could be better. Demand is still huge. Challenge is to think about priorities for applying the money. Humanities and social sciences should get together and decide on collective priorities for digitization. Should be discipline-driven, not opportunistic.

ASIS&T 2007: Understanding Information Work in Large Scale Social Content Creation Systems

Wikipedia: Distributed Editorial Processes
Phoebe Ayers

Who is wikipedia? It's thousands of people behind the site. Lots of groups joined by shared values of openness and shared values: free content; open to all; key editorial policies (Neutral Point of View, no original research, verifiability).

How do tens of thousands of people with no top-down control write the world's largest encyclopedia?

Wikipedia is governed by non-profit foundation. Has several sister projects -- we're only talking about wikipedia. No one is in charge of editorial decisions. Wikipedia has a modest goal: giving every person full access to the sum of all human knowledge.

There are lots of self-organized tools -- for cleaning up articles, for defining NPOV, for style. Information works in wikipedia as a sum of distributed social processes and the technical structure of the wiki and culture of openness.


Technology, Theory, Community, and Quality: A Talk in Two Acts
Dan Cosley

Act I:
Matching people with tasks they're likely to do motivates contributions

The problem is that some articles need help in some way (items are tagged). These articles are listed on a community page. If you want to fix something, hard to find a page you want to fix. Built a recommender engine so that people are given pages to edit based on things that they are likely to be interested in fixing. This worked well in MovieLens. Translated to Wikipedia. Wrote SuggestBot -- it goes through list of articles tagged as needing help; finds items that are similar to items that person has edited, written, etc., before.

Through wikipedia, you can see if someone edited article. Four times as many articles get edited through recommendation engine -- it works. Other communities should take this approach to editing/moderating. Or match a new user in a community to an older member who talks about similar things.

Theoretical basis for this: collective effort model says lower effort = great reward. Therefore we should build interfaces and algorithms that help people find work to do.


Act II:
Understanding community is huge for improving information quality

Knowing system (wikipedia content in this case), knowing users, and knowing habits all help inform the recommended engine. A failure: an automated welcome to the community to new users (people with their first edit in December 2005, about 28,000 people). Looked for people who had "welcome" on their home page. People with "welcome" messages edited more entries. However, wikipedia culture was that only good members got welcomes (bad members got warnings). But there still seems to be an effect -- people with a welcome message went on to be a bit more active in wikipedia. But this is not strong.

Information Quality Work Organization in Wikipedia
Besiki Stvilia

Why do work organization models matter? To design effective, sound, robust models for different contexts/domains inexpensively through knowledge reuse. To establish benchmarks for analyzing and evaluating existing models.

Questions studied: How does the community understand quality? What processes exist? What are motivations of editors? What are dynamics of information objects? Why do people contribute? What IQ intervention strategies are used?

Percent of pages in wikipedia devoted to articles has decreased from 53% to 28% since 2005 -- more effort is going in to talk, discussion, and so forth pages, less on articles themselves. More emphasis on community building by its users.

IQ processes: content evaluation, editor evaluation, building and maintaining work infrastructure.

Differences between wikipedia and other systems. First, user feedback and information creation are the same process in wikipedia, unlike other systems. Quality control and author of data are separate, for example, in library catalog. End user and editor roles are merged. Product creation and delivery environments are the same. Work coordination is informal and ad hoc.

Wikipedia controls quality through content and editor evaluation. Some parts of process are formal, others are informal. Because there's little built-in mediation, disagreeing parties must come to their own agreement (or else endlessly erase the other's contribution). Community experiments with different intervention processes when there are conflicts -- trying to find the best approach at any moment.


Wikipedia Reference Desk: Processes and Outcomes
Pnina Shachaf

A study to evaluate the quality of processes and outcomes at wikipedia reference desk. There is a reference desk at wikipedia. It uses a wiki to process reference transactions. Users leave questions; wikipedia volunteers help users find the info they need. Organized under seven categories: computing, entertainment, humanities, language, mathematics, miscellaneous, science.

Not a lot of work in social aspects of Wikipedia community. In particular, opportunity to learn from wikipedia reference desk as a way of improving service in traditional reference desks.

What is quality of answers at reference desk? Looked at 210 transactions and 434 messages (in April 2007). In this month, there were 2000+ transactions and 11,000+ messages. Most were in science and miscellaneous categories. Most responses per question in mathematics. (Entertainment and Miscellaneous had the fewest.)

170 users (122 expert, 48 novice); 34 participated in multiple reference desks. Experts are more active at reference desk. Novices submit more questions (44 vs. 33). Novices are more likely to ask questions (70% of novices, 29% expert); experts answer more questions. By profession -- computer/IT professional are plurality.

Most questions (96%) got an answer; 92% got a complete or partial answer; average time to first response is 4 hours and to last response 72 hours. Accuracy level is about 55%. Response completeness 63%.

There is question negotiation; 28% of time there's a follow-up post from requester. There are elaborations -- improved answers -- 67% of the transactions. Additional resources, different point of view, different solutions, etc.

Wikipedia reference desk quality is "not too bad; can be improved probably". Collaborative effort yields interesting results. Future study will try to compare with small groups of librarians who use a collaborative process.


Q&A

Q: How did you determine accuracy of response?
A (Shachaf): Involved qualitative analysis of answers (reading them); results presented are preliminary, one-reader reviews. Final research will involve multiple reviewers.

Q: What are views on copyright of materials in Wikipedia? Is there analysis of plagiarized in Wikipedia?
A (Ayers): There should not be anything in Wikipedia that's under copyright. In practice, hard to deal with this.
A (Cosley): There's a tag in Wikipedia for identifying possible copyright violations.

ASIS&T 2007: Wrap-Up and Thoughts

I had a great time at the ASIS&T 2007 conference, Joining Research and Practice: Social Computing and Information Science, in Milwaukee. I blogged most of the sessions I attended -- see the list at ASIS&T Sessions. A few thoughts about particular sessions or things I picked up.

I experienced one of those so-simple-it's-genius moments during the session on "Live Usability Labs" by Paul Marty. The technique Paul employed -- running a usability test with two people, each in different roles for the event -- worked stunningly well. It completely avoided the awkwardness of one person thinking aloud -- hardly a natural state for most of us -- while explaining the actions being taken on screen. By play-acting, two people in the roles of graduate student and faculty member, or two colleagues, elicited great feedback from each other about what was on the screen, how it worked, and how each person expected it to work. It seemed a particularly effective technique for teaching usability to others, but I'd bet it's very effective in a more traditional usability testing situation, too.

The session on "Opening Science to All: Implications of Blogs and Wikis for Social and Scholarly Scientific Communication" was one of my favorites because it showed both some empirical research as well as effect. Jean-Claude Bradley's presentation of UsefulChem as a place where scientists can record their experiments, successes, and -- this is the key point -- dead ends gave me another "Aha!" moment about the impact of blogs and wikis on science, education, and society. Janet Stemwedel's talk on the societal implications of blogging -- particularly within the scientific community -- was also very interesting. The divide in the sciences between those who embrace the openness two-point-oh technologies engender is even starker than in the social sciences and humanities, domains in which I'm more comfortable. At the same time, the potential short-term benefits to the general population are even greater in the sciences than in the humanities.

Clifford Lynch's keynote address on open access was also informative and engaging. He asked the audience to consider what it was that academia wanted to achieve when it created the institution of the academic press and whether that role is currently being met. He says that one of the biggest challenges for universities is to decide if they still have fundamental role in stewardship of intellectual research. While this is a fundamental role of research libraries, their parent organizations expect them to accomplish it without the depth of funding or support that is necessary. If libraries, or universities, are the stewards of intellectual research, they must make great strides in technologies to ensure that today's research is fully usable in the future. Lynch left far more questions unanswered than he answered -- it was truly a thought provoking and stimulating talk.

On a very much related note, I was struck by the fact that numerous academic researchers made comments in the course of their presentations about how information -- reports, documents, data, etc. -- are all available on Google and so not much attention needs to be paid to stewardship. I fear that too many people, in and beyond the academy, view Google as the universal library. This is far from the truth. Perhaps Google is the universal card catalog, but even that is a stretch. Google's business model is very different from that of a library. Google is all about access (local copies for indexing aside); libraries are all about preservation and stewardship of information. (Saturday's Unshelved comic strip makes this point more humorously and succinctly.) As a librarian, I grow concerned when academics -- the primary user population I support -- so blatantly misunderstand the role of the library.

CIL2008: Keynote on "Libraries Solve Problems"

I'm attending Computers in Libraries 2008 and will be blogging many of the sessions I attend... I'll post my (mostly unedited) notes. If you're at CIL, look me up!

Presented by Lee Rainie, Director of Pew Internet & American Life Project.

Blogging is about information and communication. This is what makes the Internet so wonderful. That's what the era of user-generated content is all about.

Information was scarce, expensive, and institutionally oriented. now, it's abundant, cheap, and personally oriented.

In 2000; 46% of adults used the internet; 73% of teenagers. 5% had broadband at home. 50% owned a cell phone. Nobody connected wirelessly. Phone line ruled.

2008: 75% of adults, 93% of teens use internet. 54% have broadband at home. 78% own cell phone. 62% connect wirelessly (42% by wireless, 59% use cell phones over data networks -- overlap is 62%). Cell phone users tend to be minorities, less well educated -- reverses digital divide fears. Wireless connectivity is determinant of Internet behavior. Results in resurgence of email -- on a cell phone, email matters a lot. News becomes more important, too - broadly defined. Fast and mobile connections rule.

The home media ecology is immensely complex. Data moves from this to that (TiVo to computer, cell phone to cable box, etc.). Internet becomes "cloud" -- it's where important stuff is stored. The Internet is the computer and storage device. This has huge, not yet understood, implications.

Content creation -- 62% young adult users have uploaded photos to the internet. 34% of all users have done this. It's an obligation of sorts to photo-document their lives. Pictures are currency of community building and communication.

58% have created a profile on social networks (33% of adults) on MySpace, Facebook, etc. 39% of online teens (13% of online adults) share and create content online.

A quarter of online teens help others get their stuff online.

33% of online college students keep blogs. 54% of online college students read blogs. 12% of online adults have blogs; 35% read them. This gets hard to measure because blogging is baked into all sorts of tools. Reading blogs even more so; what's a blog? What do people recognize as a blog?

19% of online young adults have created an avatar that interacts with others. 6% of online adults do this.

New research on libraries in the information ecosystem. Original question was from GPO -- how do people want government documents (online, print, mail, etc.)? Survey grew to be much broader: How do people get information to help them solve problems that could have a government connection or be aided by government resources?

Asked about 10 broad areas: health, schooling, taxes, jobs, Medicare, Social Security, voter registration, local government, legal actions, immigration. About 80% of respondents had been through at least one of these problem classes and needed information. This makes about 169 million adults. Survey asked where they found information? Libraries included in possible responses. 53% of adults had been to a local library in the past year. Gen Y (age 18-30) -- 62%. Gen X (31-42): 59%. Trailing Boomers (43-52) 57%. Leading boomers (53-61): 46%; Matures (62-71) 42%. After Work (72+) 32%. Youngest cohort had the highest use of libraries. Teen use of libraries: 60% of online teens use the internet at libraries, up from 36% in 2000. Youth use libraries, contrary to expectations.

Those who use libraries are more likely to come from higher-income households. More likely to Internet users. More likely to have broadband at home. Parents with minor children at home more likely. Libraries matter more in the Internet age, not less (as previous expectations were). Internet users are more active in information gathering and usage than non-users. No real difference in patronage based on race or ethnicity.

How people solve problems? What sources did you use when you confronted the most recent problem you faced? 58% used Internet overall. 53% turned to professionals, then other sources. However, young adults (18-29) 21%. Blacks 26%; Latinos (22%). Younger people relied on libraries, as did minorities and lower income users.

Most popular problem-solving searches at libraries: schooling/education, finding ways to pay. Then jobs, serious illness, taxes, medicare/medicaid.

Once people are at the library... 69% got help from staff. 68% used computers (38% got technical assistance). 58% sought reference materials. People and resources matter. Libraries are social learning experience.

Future intentions: Would you go back to the library for a future problem? Overall, about 29% were somewhat likely or more. But -- less well off (40%); Gen Y (41%), less educated (41%), Latinos (42%), Blacks (48%).

Why are youth so library-centric? Lee's hypothesis: they have the most recent experience with libraries (through school assignments). Based on recent experience, they are more aware of how libraries have changed, more than other age groups. They know libraries can help.

Takeaways and Implications

Public education efforts about what libraries do and how we have changed are likely to pay off. Focus on success stories and competence. The people who know us best are the ones who keep coming back.

Patrons are happy and zealous advocates. Encourage your patrons to evangelize on your behalf. Give them Web 2.0 tools and, if needed, training to use them. They are eager to give you feedback.

Your "un-patrons" are primed to think of libraries. Need to let them know what you offer: tools available, training, mentoring skills, comfortable environment.

This is the era of social networks. People rely more now on social networks than ever before. They are for learning, news/navigation, support and problem solving. This last point is very important. Libraries can have a huge role in this. How can library be a node in social network.

Virtual communities are becoming more person-centric. Not created by a "publisher", but ad hoc built around your friends and people you trust.

CIL2008: Going Local in the Library

CIL2008: Going Local in the Library, Charles Lyon (SUNY Buffalo)

What is local web

The web viewed through a lens of where you are. Not just spatial, but lots of other information you need. Which stores are open now? Which are in good neighborhoods? Which can handle my particular needs? Doing local information is hard; it's very individualized.

Google does this better than anyone. Search results are customized to where you are. But don't include the really useful information a true local could give you.

Google spends a lot of effort on this -- so libraries should do, too. Google is the bellwether.

So what is the local web? Some pieces:


  1. local search engine

  2. maps

  3. local media

  4. local photos/data/video/blogs

  5. local social networks

  6. local people --this is the most important part.

The local web is social. It's user-generated, participatory, amateur, civic, grassroots, citizen's journalism. It's by and from the community it serves.

It's localized -- about neighborhoods, communities, blocks, streets, buildings. Not just geographical areas, but about "imagined communities" -- people who seem themselves as part of a small unit.

Local web is joining the real world and the virtual world. Interconnection between the two. It brings the placeless infosphere -- the cloud -- down to wherever you are. It reverses the "antisocialization" that was feared in the early days of the Internet.

Local web brings a sense of place to the Internet. It's becoming big business -- lots of companies competing in this space.

What do libraries bring to local web

Information, local information (events, community directories, guides to local events and communities.

What can libraries do that extends this?

Everyday life is still local. The internet is getting more local. Web 2.0 has many local applications. Libraries are community-focuses institutions. Libraries have experience with local information... There is an opportunity for libraries to become even more local-focused in the web environment.

Strategies: become expert users of local resources. Raise awareness and assist the community in using online local resources. Broaden the scope of local data collection. Become active participants in community-focused resources. And create locally-focused content.

Examples of local 2.0

Local search: Enhance their own listings in local search engines; advertise (no cost!) in the local search engine. Create your own search engine -- that only searches the sites you specify. Libraries can build a search tool that only includes the stuff that you feel is relevant to your clientele.

Local blogs: placeblogs, metroblogs, neighblogs. Create a local blog directory. And once you've found them, add them to your local search engine. And libraries can blog themselves -- not about the library, per se, but about the community it serves. Whether broadly or narrowly focused, you can take advantage of library's knowledge (or librarian's knowledge).

Local News: News refocused on local geography -- the news that happens close to you. They're blog-like: people can comment on news articles, set up profiles, learn about neighbors.

Locally-focused online communities (Skokie Talk, MyHamilton.ca). Wikis focused on local area, open to contribution by community.

Local data: HelloMetro.com and EveryBlock.com (San Francisco, Chicago, NYC only). News for your neighborhood at block level. Building permits, restaurant inspections, graffiti, all sorts of things that are important to the neighborhood. Much of this is already available -- but not aggregated by address. Everyblock is grant-funded and will open-source their code at the conclusion of the project.

Local Photos: Geotagging is geographic metadata to online information. As simple as a zip code, as complex as latitude-longitude. Geotagging makes it easy to find things. Flickr is leading drive for this in photos. Libraries can aggregate local photos.

Maps: It's easy to create a custom map.


Why libraries are primed for local?

Local is cheap. Using free services. Guidespot, ineighbors. Local sites generally don't generate revenue -- they're labors of love. Perfect for libraries. Also, it's not too late -- there's no winner in the local web. There are lots of kinds of local data that aren't web accessible yet. Much of local data is not easily automated; still requires people to determine relevance to the locale. Helps build good will.

This can be applicable to academic libraries, too -- local as the campus, not just the community.

CIL 2008: Mobile Search

Megan Fox and Gary Price

Slides and more will be available at web.simmons.edu/~fox/mobile/


Mobile Market

3.3 billion mobile phones. 46 million wireless subscribers used mobile search (mostly through text, not web browsers, on the phone).

iPhone users responsible for 50 times the traffic in mobile search. 85% of iPhone users accessed news and information on their phone (compared to 58% of other wireless users). Most searches are simple, single words (hard to enter text on a mobile device). Gary thinks that next year voice search will be the new thing -- you say your query, you get results by text or email.

Some search tools are carrier-specific; some are phone-specific.

People who search from mobile devices are generally looking for "ready reference" information (facts, figures, stock prices, weather etc.). Rarely in-depth research. Search engines have mobile search interfaces, aimed at handheld devices. THey assume that mobile user wants facts, information. And that users don't want to type much. Searches are aggregated across silos otherwise provided to web users (so news, images, sites, etc., are listed on one page, not on several). This trend -- "one search" for Yahoo, "universal search" for Google -- is on the rise in web searching, too.

How to delivery high-bandwidth content to mobile devices with different capabilities, and with providers that allow different traffic, is a challenge.

Yahoo's mobile search has 'snippets' -- stripped down 'widgets' -- that give you a preview of web content you frequently access.

Google indicates pages tailored for mobile devices with a tiny green icon. There are sites that transcode -- convert for mobile use -- regular web pages to mobile pages. They work differently, though; some handle different kinds of content better than others.

Live Search -- Live Mobile. Makes assumptions about your future searches based on past use. Also uses personal search histories; things you've searched before are remembered and influence future searches.

4info -- lets you search by text.

Alerts -- services will watch news (sports, etc.) for certain thresholds, and will send you a text or email alert when something happens (a score is close in the 7th inning, etc.)

Medio -- Working on a "predictionary" -- predicts the words you are going to finish typing, based on words you've typed in the past. Does on the mobile device what your browser does in remembering past search queries.

Lots of mobile meta-search/federated search tools. MCN, obovo.com, upsnap are up-and-coming players in this market.

Using your phone's camera to take a picture of something, send it to omoby, mobot, or snapnow, it sends back a search response based on the photo. Also new 2D barcodes -- take a picture, it prompts your phone to pull down a URL, send a text message, etc. These are much more common in Europe/Asia.

chacha -- call 1-800-2chacha or text "chacha", say your question, get an answer by text. Humans do the answering. They provide an answer and a source URL. Not clear who is doing research (probably not librarians!)

Location based search -- based on where your phone says it is, gives you localized search results.

Location based search -- actually, more like a directory. you say where you are, it offers you categories that you can look through. The return of Gopher!

Clusty, a search clustering engine, works well in mobile environment. Brings back results by kind (a search for "apple" offers company, fruit, etc., categories as a filter.

Behavioral targeting on mobile devices is coming. Real estate is small, importance of what gets sent there is critical. Making sure that the right content gets to the mobile device is important.

Spinvox -- Listens to your phone calls, sends you information on topics you discuss. Can also update your blog from dictation.

Searchme uses a presentation of search results like "cover-view" in iTunes or iPhone. Results pages are presented in thumbnail view that you can flip through one at a time.

A directory of hundreds of search tools (available for the next two weeks, go to mlvb.net and log in with rubble888 and cil2008.

CIL2008: Library Web Presence

Widgets at Penn State

Ellysa Stern CahoyEmily Rimland. Facebook application for the library. Led them to think about simple pages. Build "Research JumpStart" aimed at beginning users. Uses widgets -- little bits of content taken from their source and dumped into another page.

Widgets provide easy access to popular, most valuable resources. Once you have widgets, you can place them in other environments (iGoogle, for example). Widgets help you compartmentalize your information and provide just what's needed, when it's needed.

Widgets on JumpStart page: 1) Catalog search. 2) ProQuest 3) Research guides for specific courses/subjects -- just the guides that are most used by undergraduates. 4) Chat widget (they use AIM).

iGoogle widgets have proved very popular. Faculty and students have liked taking the search tools and RSS feeds and creating a personalized page.

Binky Lush. How these were developed. Uses WidgetBox. Provide widgets for all sorts of services (iGoogle, PageFlakes, social networking sites, etc.). Provides code for your own site, to include in a blog, etc. All of Penn State's widgets are hosted by Widget Box. This is the "get widget" chicklet that appears on the JumpStart page in each gadget. This gives you a window with options for all sorts of places the widget can be embedded -- code customized for each site -- or raw HTML.

I wonder about whether this makes sense; to host this sort of content on an external site. What are advantages? There's obvious ease of creating the widget, but shouldn't core services be hosted locally?

WidgetBox lets you create a Facebook widget, but doesn't fully take advantage of Facebook's social graph -- so PennState is developing their own.


LibraryGuides at Temple

DerekDerik Badman and Kristina DeVoe

Original subject guides were static pages, long lists of annotated links. There were based on Contribute, which was not easy to use, according to DerekDerik. No functionality other than what was on the page.

Brought in LibGuides in spring 2007. Had a semester to migrate all 90+ guides into LibGuides. Was fairly easy to do. Creating and maintaining guides easy. Also very flexible. Content of guide can be organized by resource type (like always), but also by any other categories library wants -- time period, topic, etc. Units of class, paper topics, anything that's needed. And that librarian has time for.

Content is modular. Easy to take a content block from one guide to another. Easy to share.

Users can find guides by subject, by tags, by "featured resources", by recently updated, by ratings. Users can comment on guides -- either on guide as a whole or on a section. Allows community building to start. LibGuides also has a polls feature -- about the guide, or anything else.

They've added widgets (chat, calendar, etc.) as well as direct search boxes so that users can search directly in featured resources without having to first go to a page and then search. Similarly, tailored federated search. Pull in RSS feeds from various sources -- for example, table of contents for specific journals or news.

Have used for course guides -- a guide not just for a subject, but for a particular course. Resources are targeted to specific classes and contain resources that are relevant at that point in the semester.

Usage... Usage has gone up significantly (static guides vs. dynamic guides).

Marketing is important. Students need to know the new guides exist, that they are better than the old.

What else can LibGuides be used for? Ideas... 1) Information literacy. For example, adding descriptions of "primary sources" to the Temple history guide. 2) Co-opt faculty; invite them to get involved and become partners in creating the resources, tailored for their needs.

Question: What are privacy implications of using a service like widgetbox or libguides?

Answer: LibGuides doesn't save any data. No user accounts are created. It is hosted at LibGuides. Widgetbox... Widgetbox is similar, but not clear how much data is stored.


Question: How easy is it integrate guides into local web site?

Answer: We don't know yet. Redirected old URLs to new. But since LibGuides is hosted, it's not on the same server.


Question: Are other sites embedding PSU's widgets in their sites?

Answer: We don't know -- don't have that level of detail as to where it gets embedded.

Update 4:30 PM 7 April Corrected name of first PSU speaker and corrected link. Update 11:20 PM 7 April Corrected Derik's name. Not my day for getting names right.

CIL2008: Text Mining and Visualization of Open Sources

Text Mining and Visualization of Open Sources

Patrice Slert

We're talking about structured data from open sources (Web of Science, Dialog, Silobreaker, the Internet), not necessarily free sources. This is in contrast to intelligence data, where a lot of the technologies have applications, as well.

Visualization can mislead you in terms of cause and effect. It can also lead to false similarities (such as New England and England being presented as the same place).

Open Source Information (OSI) is growing. Intelligence community is recognizing the value of librarians in searching the open source information space.

ISI Web of Knowledge includes visualization and text mining capabilities. However, limited to databases provided through ISI. To mix and match with data available through other vendors, need to use other products, such as VantagePoint. VantagePoint allows you to create filters for importing data from various sources.

SiloBreaker -- a news analysis tool, commercially available. It lets you mine for information via word searches, visual searches, people, organizations, industries -- ways of pulling together relationships among these facets. It pulls out networks of people, as reported in news reports. It's provides a way to look at the news and see who is appearing in news articles about the subject. You can expand your search -- or refocus it -- by diving deeper into related people, organizations, companies, etc.

CIL2008: User-Generated Content

Roy Tennant

Not an overview of ways users are creating content. If you want that, go buy Social Software in Libraries by Meredith Farkas. Focus will be on user-generated content on library managed sites.

Roy's tenets for user-generated content: More content is better. More access is better. Can provide more personalized service. Can foster interaction and community. We don't know everything -- we don't know all we can know about our own collections. Our users can help remedy this. More data trumps better algorithms. (Google learned that the more data you have, the better your algorithms are. Code can't make up for lack of data.)

Contributions of content. Institutional Repositories are a collecting point for user-created content. (This is often not thought of as a user-generated source.) Even if faculty aren't doing it themselves, faculty are still getting their content into the library.

Kete.net is an open repository for whatever anyone wants to contribute. (Kete developed by the folks who did Koha.) They're digitizing the Cyclopedia of New Zealand and are transcribing text. Also enabled software to handle genealogical information well. So a community can start to get a handle on genealogical past.

Descriptive contributions. Example of the Great Lakes Images, where they post photos and get community members to fill in details (names of subjects in photo, places, etc.) Library of Congress's Flickr project is similar. 5.4 million views of content in first month. Immensely successful.

What has LC accomplished? Higher profile for collections. Enabled community engagement. And corrected metadata. But more importantly, sparked comments and conversation around the images being tagged. People became very involved in the images. And higher visibility for LC blog. Boston Public has done this, too. But they've had less traffic than LC.

Exploits knowledge of the masses. Library staff may not be closely connected to the collections they manage. They may not know much about the specific collections being featured. Web offers a feedback loop.

Bookspace at Hennepin County Library -- offers community space around books. Has readers' lists -- on wide range of subjects, created by library users. Also guides by librarians; these are likely less specific and focused (not to mention less numerous).

Tags. Uses user terminology. Even if it's "stupid," it's the user's. There's a very low barrier to use for users -- type and click. It's useful to the tagger (or else they wouldn't be doing it anyway). But it is also useful to others. However, tags can be redundant (for example, "blogs," "blogging," and "blog" are all, probably, the same). Phrases are often complicated and inconsistent. Steve is a tagging project by several museums. A few tags often get applied by many users.

LibraryThing's Tagmash brings together tags that are really synonymous. It works "pretty darn well" for bringing together works on a similar topic. The more data you have (the more users), the better the results.

Third-party providers in this general space. SpringShare (LibGuides and LibMarks), LibraryThing for Libraries, ChiliFresh (book reviews by readers).

Things to keep in mind...

Our idea of content might not be our user's idea. People are going to do weird things. It's going to be messy, and that's OK.

Need to know what your goals are. How do you distinguish between user content and library content? Will you need to moderate in some way?

We (libraries) need to do better at inviting our users in. We need to figure out how to get better at using these technologies.

CIL2008: Keynote on "Libraries: Innovative & Inspiring"

Erik Boekesteijn, Delft Public Library

Jaap van de Geer, Delft Public Library

Geert van den Boogaard, Delft Public Library

This session was a presentation and discussion of their Shanachietour 2007, in which they crossed the United States in an RV interviewing and filming librarians and patrons. They played a segment with the head of the NYPL, Paul _____, in which he talked about his efforts to "reoxygenate" the library.

Then went to the Public Library of Charlotte & Mecklenburg County (Virginia) to talk with Matt Gullett (of the Imaginon) about gaming. "Containers" of information will change -- but books aren't going away. Technology will allow genres of information to find their appropriate digital (or analog) form. "The book is one of the best technologies ever invented, but it is a technology." We forget that.

Next stop, Michael Stephens' library school class at Dominican University. Brought up a library student from UIUC; she concluded her conversation with the filmmakers by saying that the best skill a librarian is to be open to change.

Ended back in Delft, at the Delft Library Concept Center (DOK) -- a future-looking library. It still has books, of course, but also has all sorts of digital media and tools to use it with. Gaming, too, of course. The DOK is all about people, according to its director -- people are the most important collection. From the video, the DOK has the feel more of a bookstore (à la Borders) than of a library: open, airy, inviting, and filled with people using the print and digital collections. Brings the digital into the library, rather than having the library be the access point to it.

CIL2008: The New Generation of Library Interfaces

Presented by Marshall Breeding, Director for Innovative Technologies and Research, Vanderbilt University

Marshall Breeding maintains Library Technology Guides site. Today's topic is next-generation catalogs.

Patrons are steering away from the library. Scarily low percentages of users think to start their research at the library. Libraries live in an ever-more crowded landscape -- there are so many places information seekers could go. Our catalogs and sites do not meet the expectations of our patrons. Commercial sites are engaging and intuitive. "Nobody had to take a bibliographic instructions class to use a book on Amazon.com."

A demand for compelling library information interfaces. Need a "less underwhelming experience" at a minimum.

Scope

Current public interfaces have a wealth of defects: poor search, poor presentation, confusing interfaces, etc. Users need to go here, or there, or elsewhere, to find the kind of information they're looking for. We make them make choices. The entire audience agreed (by show of hands) that the current state of OPACs is dismal.

We need to decouple front end from the back end. Back end systems are purpose-built and useful (to us). Front end systems should be useful for users.

Features Breeding expects to see in next generation.

Redefinition of "library catalog" -- needs a new name. Library interface? Isn't just an item inventory. Must deliver information better. Needs more powerful search. Needs, importantly, a more elegant presentation. Keep up with the dot com world.

It must be more comprehensive -- all books, articles, DVDs, etc. Print and digital materials must be treated equally in the interface. Users must not be forced to start in a particular place to find the material they want. They want information, not format. More consolidated user interface environment is on the horizon.

Search -- not federated, but something more like OAI -- searching metadata harvested from databases, not just the first results returned by each database. Coordinated search based on harvested/collected metadata. Reduces problems of scale. Still great problems of cooperation. Also -- questions of licensing.

Web 2.0 influences. Whatever the next system is, it needs to have a social and collaborative approach. Tools and technologies that foster collaboration. That means integrating blogs, wikis, tagging, bookmarking, user rating, user reviews, etc. Bring people into the catalog. At the same time, important to create web 2.0 information silos. Don't put the interactive features off on the side -- integrate it. Make it all mutually searchable.

Supporting technologies: Web services, XML APIs, AJAX, Widgets. The usual suspects.

New interface needs to have a unified interface. One front end, one starting point. Link resolver, federated search, catalog, web -- all in the same place, same interface. Combines print and electronic. Local and remote. Locally created content, and even -- gasp -- user contributed content.

Features and Functions

Even if there is a single point of entry, there should be an advanced search that lets advanced users get to specific interfaces. Relevancy-ranked results. Facets are big and growing. Query enhancement (spell check, did you mean, etc.) -- to get people to the right resources. Related results, breadcrumbs, single sign-on, etc.

Relevancy ranking -- Endeca and Lucene are built for relevancy. Many catalogs have default results lists by date acquired. However it's done, the "good stuff" should be listed first. Objective matching criteria need to be supplemented by popularity and relatedness factors.

Faceted browsing -- users won't use Boolean logic, need a point-and-click interface to add and remove facets. Users will do an overly broad search; you can't stop them. Let them, but give tools that allow them to correct their "mistake" easily. Don't force them to know what you have before they search.

Need spell check, automatic inclusion of authorized and related terms (so search tool includes synonyms without user having to know them). Don't give them a link from "Did you mean..." to "no results found." That's rude. Improve the query and the results without making the user think about it.

Don't get hung up on LCSH -- think about FAST. Describe collections with appropriate metadata standards. Good search tools can index them all, anyway. Use discipline-specific ontologies -- even if not invented by librarians! -- as they are the language of the users.

More visually enriched displays. Make them look nice. Book jackets, ratings, rankings.

Need a personalized approach. Single sign-on. Users log in once, the system knows who you are, and that's it. No repeated signing on. Ability to save, tag, comment, and share content -- all based on the user's credentials. Allows them to take library into broader campus environment.

Deep Search. We're entering a "post-metadata search era". We're not just searching the headings of a cataloger, but we're searching the full text of books and across many books. And we can soon search across video, sound, etc. Need "search inside this book" within the catalog.

Libraries aren't selling things; we're interested in an objective presentation of the breadth of resources available. Appropriate relevancy for us might include keyword rankings, library-specific weightings on those keywords, circulation frequency, OCLC holdings. Group results (i.e., FRBR). Focus results on collections, not sales.

What we do must integrate into our "enterprise" -- university, government body, city government, etc. We need to put our tools out where the users are since (as we know) we're losing the battle to make them come to us. Systems must be interoperable -- get data out of ILS and into next generation systems. And hooks back into ILS from front end.

This won't be cheap, in terms of money and effort both. But we can't afford not to make this transition. We don't have years to study and work to catch up with where we should have been years ago.

Is there an open source opportunity? Yes, but implemented systems are not taking the open source approach, for the most part.


I had hoped for a product review in this session, but the overview of features and desiderata was very helpful. There was a whirlwind tour at the end, but I would have liked an overview of what's there.

CIL2008: Drupal and Libraries

CIL2008: Drupal and Libraries, presented by Ellyssa Kroski

Uses a course page she set up for her library school course as an example. Students each had a blog; could tag their blogs and posts; favorite things within the community; share things via email; upload videos and photos; create and take user polls; buddy lists; guest book (i.e., Facebook Wall). A class chat room and tag cloud for site's tagged content. What's new on site -- recently added/updated content.

Drupal runs on Apache, MySQL, and PHP. Has 3 components. 1) The core CMS that lets you organized and publish content to the web. This core functionality is well maintained, with a release schedule and bug fixing. 2) Contributed modules -- things added by the user community. A bit of the "wild west" with these; not much oversight or control. Some are very well done; others not. 3) Themes. The skin on the site. Created with a combination of HTML, PHP, CSS.

A very active/engaged user community. Including many libraries. Most recognized, probably, is Ann Arbor District Libraries. Wrote a custom module to place OPAC into Drupal framework. L-Net staff intranet. Manages 65,000 virtual reference transcripts. Franklin Park Public Library uses Drupal. Done by one person, not an IT guy. St. Lawrence University Library -- staff intranet as a communication tool for student workers on evenings and weekends. Using Drupal to plan redesign. Public web site, launching in fall 2008, will combine all library web sites. Includes course resources module that will allow faculty to build course resource lists; students will be able to vote on them and upload images, etc. IUPUI Library -- pulls databases from Metalib, via X-Server, and organizes them into appropriate subject guides by categories. Librarians have subject guides, more frequently updated than before (ease of updating).

Simon Fraser University library uses Drupal for workshops page. Users can register, wait-listed, etc. Staff can manage registration lists. Uses Drupal events module. Florida State University Libraries. Content is currently managed through pages, but are moving into more of a true CMS implementation. Red Deer Public Library. And many other examples.

Slides and links are available at

http://oedb.org/blogs/ilibrarian/2008/drupal-and-libraries-at-cil2008/

CIL2008: The Open Source Landscape

This is the presentation I hoped to have in yesterday's keynote

Marshall maintains a list of who has what catalogs on his Library Technology Guides site.

Federated search systems: LibraryFind; dbWiz (Simon Fraser); Masterkey (developed by Index Data). masterkey.indexdata.com for a demo.

OCLC offers some open source software -- but not cutting edge stuff. Fedora is a major digital repository engine. VTLS Vital is based on Fedora. Fedora Commons is a support service around it. Keystone -- also by Index Data.

Open Source Discovery Products (i.e, Next Generation Catalogs)

- VUFind. Apache Solr/Lucene.

- eXtensible Catalog (Mellon funded). Not a product now, but will be one day. XC are currently seeking institutional participation. This will "probably become a player" in the coming years.

- Others, such as Fac-Bac-OPAC, Scriblio (formerly WPopac).

Open Source in the ILS Arena

Shifting from open source being risky to open source being mainstream. Medium-sized public libraries are going with open source solutions for catalog; it no long requires massive technological effort or as much risk as it did.

In 2002, the open source ILS was a distant possibility -- 3 of 4 tools Marshall reviewed then (Avanti, Pytheas, OpenBook, and Koha) are now defunct. In 2002, open source ILS wasn't a trend.

In 2007, world starting to change. Slowly. A few hundred libraries had purchased an open source ILS; 40,000 had purchased a commercial product. In March 2008 -- early adopters are now catalysts for others. There's a small installed base, which makes others see the possibilities as being real. It seems now that we have a bona fide trend.

The ILS industry is "in turmoil". Companies are merging; libraries are faced with fewer choices from commercial vendors; this gives more credence to ILS arena from standpoint of competition.

Decision to go open source is still primarily a business decision -- as a library, need to demonstrated that the open source ILS best supports the mission of the library.

Current Product Options

Koha first open source ILS. Based on Perl, Apache, MySQL, Zebra search engine (from Info Data). Has 300+ libraries using it. Including Santa Cruz Public Library, 10 sites and 2 million volumes. Has relevance-ranked search, book jackets, facets, all that jazz.

Evergreen. Developed by Georgia Public Library consortium. Two year development cycle (6/2004 - 9/2006). A single shared environment shared by all libraries. One library card. Switched from SIRSI Unicorn. Succeeded in part because of standardization of policies across libraries (lending policies, etc.). Used in Georgia, British Columbia, Kent County (Maryland), and under consideration by a group of academic libraries in Canada. So far, only publics have adapted).

OPALS Open Source Automated Library System. Developed by Media Flex. Both installed ($250) and hosted ($170) services. Used by a consortium of K-12 schools in NY.

NextGenLib ILS designed for the developing world. 122 installations (India, Syria, Sudan, Cambodia). Originally closed, converted to open source in early 2008. More information from Library Technology.

Learning Access ILS. Designed for underserved rural public and tribal libraries -- a turnkey solution. But may be defunct, according to Marshall. Built on an early version of Koha, but customized.

Open Source Business Front

Lots of companies offer a business plan to help support ILS software. Index Data, LibLime (Koha), Equinox (Pines), Care Affiliates; MediaFlex.

Duke is working on an open source ILS for higher education (looking for funding from Mellon; Marshall is involved).

Open Source Issues

Rise in interest led by disillusionment with traditional vendors. But total cost of ownership is probably about the same between open source and traditional tools. Libraries hope that they are less vulnerable to mergers and acquisitions. There's no lump sum payment (though still need hardware, support -- internal or external -- and development costs. Not always clear who is funding the next generation of the current system.

Risk factors: dependency on community organizations and commercial companies. Decisions are often based on philosophical reasons, but they shouldn't be -- you need to consider the merits of the system itself. Make sure features and functionality are what you need.

Open Source vendors/providers need to develop and present their total cost of ownership -- with documentation.

"Urgent need for a new generation of library automation designed for current and future-looking library missions and workflows." That is, systems built for our digital and print collections. Open source tools do OK for systems of yesterday; will they meet the needs of the new library?

Q: How close are we to a system that does not utilize MARC records?

A: Not very. We need systems that do MARC, and Dublin Core, and ONYX, and RDF, etc., etc. The value in existing MARC records is too large to ditch. (Of course, it needs to be MARC XML.)

CIL2008: Open Source Applications

Open Source Applications

Glen Horton is with the SouthWest Ohio and Neighboring Libraries

Libraries and Open Source both:
- believe information should be freely accessible to everyone
- give stuff away
- benefit from the generosity of others
- are about communities
- make the world a better place

Libraries create open source applications (LibraryFind, Evergreen, Koha, VUfind, Zotero, LibX, etc.)

Miami University of Ohio has a SOLR/Drupal OPAC in beta (beta.lib.muohio.edu). Not even a product -- just a test environment.

How can you do this without a developer? You can contribute to the community in other ways. Teach how to use the open source tools your library has installed -- even if not developed there. Hold classes for your patrons on how to use the tools that are available. Help build a user community around the open source tools that you think are of value.

You can document open source software -- improve the documentation for other libraries. When you figure it out, help others down the same path. Documentation is often hit or miss; developers are not necessarily good documentation writers - or don't have time to do so. You can help debug open source tools. Report bugs!Influence the development path for the software. Bigger projects often have active support forums -- lots of people reporting and fixing bugs. Smaller projects may not have that infrastructure.

Even if you don't create or use open source software, you can promote it by linking to it from your web site, distributing it on CDs or thumb drives, etc.

"Open Source or Die." Libraries benefit from open source -- make sure that you are giving back to equal the benefit. Teach it, use it, document it, evangelize it.

Slides are at http://www.glengage.com/.

Open Source Desktop Applications

Julian Clark is at Georgetown University Law Library.

Why open source? It's free! As in kittens. Which means - acquisition is no cost, but you've got a lifetime of maintenance and upkeep. But even more so... you have control and customization. You can change it to make it look and act the way you want. Security -- active communities keep applications safe and updated against whatever the latest attack might be.

Why now? FUD about Open Source is declining. (FUD = Fear, Uncertainty, and Doubt). As open source becomes more mainstream, gut reaction against it is on the decline.

When is best time to adopt? When you're ready; there's no easy way to gauge this. Depends on your IT support, library management, colleagues... But it can fit into your major upgrade cycle. If you're planning a major upgrade anyway, why not consider a switch rather than an upgrade? These upgrades often have long lead times; why not take advantage of that planning process to migrate? Also could be triggered by reduced capital funding -- where you have staff, but not money, to spend on your systems.

Can you do this? Do you have the right hardware to run the tool? (This applies to both back-end or web-based systems as well as to the operating system for public use computers -- a replacement for Windows, for example.) Does your organization's IT group support open source -- how much can you do, with whom do you have to collaborate?

Support options -- purchased 3rd-party support; often available, varying degrees of quality and availability depending on the software being supported. Can often hire for a project, for long-term, etc. Flexibility. Of course, there's always in-house -- someone on your staff who knows (or can learn) the software and who knows and understands your organization.

Q: Glen -- what are risks of providing open source software to patrons who then want support from you for it

A: Well, you can provide it explicitly as-is.

CIL2008: Information not Location

My colleague Mike Creech and I presented on "Findability: Information not Location" (3.3 MB, PPT) this afternoon. The talk abstract:

Learn how to foster user-friendly digital information flows by eliminating silos, highlighting context and improving findability to create a unified web presence. Hear how the University of Michigan Libraries' (MLibrary) are reinventing the libraries' web sites to emphasize information over the path users previously took to access it. By elevating information over its location, users are not forced to know which library is the "right" starting place. The talk includes tips for your library web redesign process and user-centric design process.

Our talk was blogged by Librarian In Black.

I had a great time at Computers in Libraries -- there were more interesting talks than I could attend, let alone blog. I have some catching up to do through the CIL2008 tag cloud, clearly.

Open++: Dispatches from the OSS Frontlines -- Access 2008

Keynote: Karen Schneider
Community Librarian
Equinox Software (Evergreen)

Karen Schneider on "Open++: Dispatches from the OSS Frontlines". Karen's job is to travel around Georgia talking with libraries around the state helping them with the Evergreen installation.

Evergreen

We've seen lots of open source software in libraries in recent years. Tons of experimentation has taken place, lots of it by and for libraries.

Pines had a need for a consortial catalog for 270+ libraries across a large state. Some vendors said they couldn't do it; others offered far-too-expensive options. In 2004, a development team started building their own ILS. In 2006, Evergreen was launched in 200 libraries. Version 1.4 is imminent (first couple weeks of October). They have kept a tight development cycle.

Key point: With Evergreen, librarians are once again writing their own ILS. This is analogous to what happened about 30 years ago (for example, with the Melvyl catalog, built at home in pre-vendor times). In recent decades, libraries strayed from path of doing their own stuff and went down the vendor path. Now, we've come full circle and libraries are once again starting to take control of their own destiny again.

Network effect has been huge (combined with general state of economy and price of fuel): holds and interlibrary loans within the Evergreen system is growing exponentially.

There are now 275 libraries in PINES. Other consortia include those in British Columbia (Sitka) and Michigan. There's also an academic installation -- Indiana University -- live now. In development are other consortia, including an academic one. But these are just the "known" sites -- it's open source, so many other K-12 schools likely use it.

Observations about open source and libraries

  1. Documentation is critical -- must be a formal requirement. Documentation doesn't come easy. Evergreen got a Mellon grant to write it -- but that's not the normal path.
  2. Trickle-up Engagement: Originally, it was thought that libraries would automatically know how to "do open source." However, that turns out not to be the case. Libraries need some help getting started -- getting re-engaged in the software development process.
  3. Gift economy: Community around Evergreen is small, skilled, and dedicated -- a smaller community of developers than they initially expected. People contribute actively, though not as broadly.
  4. A surprising revelation: end-users are all alike - but library workflows are unique. Users are much more similar than libraries. Evergreen has a very flexible back end. This turns out to have been a very good idea. Flexibility in the workflows is critical.

Features of Openness

Open has several positive features. Communication becomes distributed -- no longer vendor-contact, it's many people looking, many people fixing. Many eyes makes a better product, with many hands to fix them. The network effect is significant on the library side: the more libraries participate, the better. Local issues and requests lead to global improvements. Customization is the user side of back-end flexibility. Fosters partnerships -- there's no need for secrecy, keeping vendors in the dark about local implementation, and libraries in the dark about vendor plans.

Cost -- it's not necessarily cheaper to go open source, but it moves the costs around from licensing to updating/maintenance.

We Love Open Source Software. No, You Can't Have Our Code -- Access 2008

Dale Askey
Kansas State University

Libraries are not particularly good at making their own code open source and sharing. This is especially true of the small, lightweight applications that we build to make ILS systems "work right" or to solve small problems. These are frequently small (a few hundred lines of code). Why not? Several reasons...

Perfectionism -- the code's not ready yet, there are bugs, not commented well, it's inefficient. Even though it gets the job done -- which is really what counts, many developers are hesitant to share their code with others.

Dependency -- we don't want to be supporting you; we can barely support ourselves. Putting it in a repository, with documentation -- a good idea. Puts a bit of distance between developer (library) and user. Rutgers is planning to launch a library open source platform -- but it hasn't happened yet (announced in April 2008).

Quirkiness -- What we do is so unique there's no point to releasing it; our problems are ours alone. This is false; while the exact problem may be different, but the general problem very often is shared. But -- if you don't share the code, you end up with the full support and updating burden. There's nobody else who can help you find and fix bugs, add new features.

Redundancy -- Perfectly good software already exists that works for most people, so why should we offer our own? Good enough -- the available solution -- is often seen as better than doing it oneself.

Competitiveness -- Our code is better than someone else's, so we want pride of ownership and don't want to share it. We build our own to be the best, not to share the technology. Institutional Repositories are a case in point -- institutions develop/implement their own but all too rarely share their successes to save others time.

Misunderstanding -- Administrators do not understand nature of OSS tools -- they understand and know how to deal with vendors. Functionality can be built on a good foundation -- the open source tool -- and customized. This is the antithesis of what vendors offer. Open source puts responsibility for getting it right in institution's hands, not in a vendor relationship.

What Can We Do?

Figure out a way to share software among libraries. There are methods for "big stuff" (Koha, Evergreen, etc.). But what about small stuff? Several initiatives, but none global Google code is one, but it doesn't meet everyone's need and isn't accessible to non-technical librarians. A library-specific repository might be useful.

Put a license on our code and let it go when asked to share it. Even for the small snippets.

Commit to the necessary human investment to build and maintain open source software for our own good.

Reward staff for contributing to open source communities. This should be viewed as a form of professional development/contribution.

Re-prioritize internally to make open source contributions happen.


Favorite soundbite from Dale's talk: "Minesweeper is like digital heroin."

MyLibrary: A Digital Library Framework and Toolbox -- Access 2008

Eric Lease Morgan
University of Notre Dame

MyLibrary is about creating relationships. It's a way to catalog resources -- very broadly defined (people, databases, books, you name it). MyLibrary invented about 10 years ago, had a lot of success/popularity then. Concept of "my library" picked up by others such as MyILibrary, etc. It was a turnkey application -- download, install, and run. It was simple, and it worked, but wasn't as complex.

MyLibrary is made up of four kinds of resources:


  1. Resources
  2. Patrons
  3. Librarians
  4. Facets and terms

All of these resources are stored in "Dublin Core-esque" data structures. Patrons in system have name, major, etc. Librarians have name, subject areas, contact info, etc. Resources have material types, subjects, academic level of primary audience. All of these descriptive terms are "facets and terms". Facets are classes of terms. For example: format: book; subject: forestry; and so on. You can have as many facets as you like, and as many subjects under each facet. It's all 2 levels deep.

Examples

Examples from the Notre Dame site:

  • Research tools -- lists of research tools
  • Subject
  • Reading list -- combination of things classed "format: journals" organized by subjects. This was created via OAI from the Directory of Open Access Journals. Specific subjects or specific journals can be added to "my library."
  • Facebook MyLibrary widget. It's not "facebook" that's important. The fact that the MyLibrary toolbox allows it to happen is important.
  • FAQs -- each frequently asked question/answer pair is a resource. They're cataloged. Then they are browsable and can be displayed on relevant subject or topic pages.

MyLibrary is not meant to do everything -- just managed "piles of stuff". It does not support search. It does not support OAI. But data can be pulled out of MyLibrary and fed to a search engine. For example, Alex Catalogue of Electronic Texts. Browsable and searchable lists of 14,000 full-text public-domain books.

MyLibrary is not a particularly strong open source project -- there isn't a community around it, for which Eric takes blame. It's in Perl, but that's a passé language now. Coming up is a web services interface on top of it, probably Atom. But some sort of RESTful web service is coming.

Question: It's been in operation about 5 years; how are students using it?

Answer: Students don't know they're using it. They don't customize it -- it's just the way the web site work.

Question: What are privacy issues with patron data?

Answer: Librarians take privacy more seriously than patrons. Patrons expect easy to use interface that gets them what they want. Libraries are behind the curve on this. MyLibrary makes some broad guesses about what patrons are likely to want. Any future personalization effort will be opt-in. Individuals won't get assigned resources, but aggregates (freshmen, math majors, etc., not John Smith).

User-Generated Content and Social Discovery in the Academic Library Catalogue: Findings from User Research -- Access 2008

Martha Whitehead, Queens University
Steve Toub, BiblioCommons

The problem is "discovery" -- getting answers to questions that you don't know how to ask. In other words, finding things you don't know about. Not just updating the catalog. They were dissatisfied with the federated search tools.

Catalogs are solitary experiences, but learning and research are social activities. User-generated content is what this project is about. Narrowly, tags, ratings and reviews. In the broader sense, curating that information.

The research project with BiblioCommons was aimed to figure out how tagging works in the academic environment. Reading lists are an obvious, and old, form of user-generated content. Research paths in libraries -- how to do subject research -- are another (librarian-generated, but we're users, too). Faculty members are the "ultimate research advisor."

The ideal research process, in an Ontario Council of University Libraries study, users want to see recommendations from "authorities," wanted to find classics in the field, and also wanted to find surprises -- serendipity.

Draws a distinction between social discovery and social networking. The former is serious. What features should be built into an academic research site? Fear that information would be misleading, that faculty (who know subjects best) wouldn't have time to contribute, that students (for any number of reasons) won't contribute.

But students are inherently social and even when in the library want friends to know where they are. Study participants wanted to know what their trusted colleagues (professors, fellow researchers) think.

User Research in Academic Environment

BiblioCommons is a next generation discovery tool, a social network, and an OPAC. In March 2008, Steve Toub recruited Queens University facutly, students and librarians to talk about how they do their research.

Non-librarians do not limit (i.e., use facets) very much. Students don't reformulate queries; they go back to original search and re-do it. Users would avoid LCSH at all costs in the catalog (but would use it as a browsing tool). Students don't "experience pain" when manually formatting citations -- it's just part of the process. Librarians think direct export to RefWorks a must. Librarians want to help; users want to be independent.

Second round of research in June about user-generated content (UGC). Went through a variety of tests, from paper prototypes to full mock-ups. Focus of this study on a vary narrow sense of UGE. Not much understanding of why people should tag. Not clear understanding of motivations for tagging in the academic library catalog.

Started by asking students, if you want to buy a camera or see a movie, where do you go? Asked if students looked at comments by others. Most of this 18-22 age range said they sought out sites with UGE. Preferred comments from "people like them" over recognized critics/reviewers/professionals. Most had used 'Rate My Professors.' They mostly looked at comments, not paying lots of attention to ratings.

One student said, "I don't necessarily want the opinion of a professor -- I'm looking for people who are as incompetent as I am."

Two of 10 students knew what tagging was by name. But they didn't have any idea what it was when they saw the MTagger tag cloud. Tagging in Facebook pictures is utterly different from tagging text. Ideas to change it: change labeling -- use "themes", "keywords", "what terms you use to help others find this", explain tagging in the cloud, not via a link.

In the catalog, showed a mockup of a review system. For recently-returned books, user can say how useful it was and for which course (from list). Provide brief survey of what user used in the book (whole thing, just a chapter and which one), etc.

Another version -- provide sliders for "relevance to course", "level of difficulty", "personal interest", etc.

The most important data element requests were things like "is it going to be on the test"? "How is this related to other texts"? "How is this related to the lecture"? Users requested clear signals about how important the item is to the class.

Most students wouldn't fill in more than 1-2 data elements -- so opportunity to collect data is limited. Most wanted anonymity, one that's not personally identifiable. Most students wanted to share their comments -- that was the point.

Asked, what if -- when you logged in to the library -- you saw the syllabi for your courses? Very popular.

When are students most likely to contribute? Only if syllabus is online. Probably not for current week's syllabus, for immediately previous week. Netflix-style "you just returned this item, would you rate it?" sort of interaction. Putting collection point for UGC at the right point in the workflow is trick.

What kinds of rewards are of interest? From list of choices, top response was to help others get to resources faster. Idea of "paying it forward" -- if I do it now, it will help others later, which will help me when I need it. Sense of "empty restaurant syndrome" -- if no tags are there, why would I join in? "Buying" student participation seems pretty easy.

Barriers to contributing: nobody wants to support freeloaders (help those who don't contribute), but I want to have content there when I want it. Fears of plagiarism overweigh willingness to share with others -- even at level of sharing reading list for a paper through an online system.

Three strategies for ensuring quality:

1) Authentication -- people log in to library and library knows who you are (even if it's not your university ID).

2) Aggregation -- pool content from multiple systems provides more content and helps "smooth out" details. Ability to identify individual users while seeing the mass.

3) Marketplace of ideas -- create a self-managed system (no editorial review) to make sure reviews themselves are vetted by the masses.

BiblioCommons roadmap

Near term -- provide an outstanding user experience -- make interface simpler, cleaner, and more intuitive.

Mid term -- organize catalog experience around courses and assignments -- not LCSH or broad subject guides. You see a course- or assignment-specific view when you log in to the catalog.

Long term -- breaking down barriers between silos. Federated search isn't the answer. Everything is integrated.

BiblioCommons Status

User research led to current priorities. This year and next -- an iterative beta release process.

Drupal: Content Management and Community for your Library -- Access 2008

Ilana Kingsley
Dave Mitchell
Harish Nayak
Debra Riley-Huff
Nick Ruest

University of Alaska, Fairbanks (Ilana Kingsley)

Movie collection: Movie covers (built with Drupal 4). Pulls in movie records from catalog (it's a Sirisi catalog, so they need to screen-scrape) and matches with images and ratings from the Internet Movie Database and/or Rotten Tomatoes.

Library web site: Ilana got tired of making small changes to site and wanted to get staff more involved in content editing. Using Drupal's modules, can customize what appears where and when.

Looked at lots of CMS tools (leaving out Plone, since Ilana didn't know Python). Installation was easy, didn't need to know lots of PHP. There's a huge Drupal community -- lots of support.

Had a two-year implementation process. Part of problem was political; campus IT department was not in favor of PHP/MySQL. Content analysis was a key element -- making sure she understood the content types so that, ultimately, they could all be defined in the database and then assigned to individuals for maintenance and upkeep.

Keeps updating/adding modules -- after testing on a development server.

Has a number of content types: Advertisement, annual report, collection guides, exhibits and collections, news & events, article indexes and collections, etc. Roles form basis for content types. Roles started with departments.

University of Mississippi (Debra Riley-Huff)

Subject guides: Used content construction kit to create a content type for subject guides. Customized navigation and presentation. The Presidential Debate guide (set up for the first U.S Presidential debate (at the campus) got heavy use. The Drupal install held up well under heavy traffic.

Themes are what makes a Drupal site look like you want it to. You can make Drupal content look the way you want it to. Best to start with "Zen" theme, which is bare-bones and easier to customize than out-of-the-box themes that come with Drupal. Matching existing site is difficult. Relied heavily on Content Construction Kit.

Government documents: A government documents repository site -- government documents librarian can maintain the content through Drupal.

University of Rochester, River Campus (Harish Nayak)

Revamped library web site into Drupal. Also, Drupal is being used in the eXtensible Catalog (XC) project at Rochester, so there's a large internal drive to make it happen there.

Their redesign process involves numerous activities: Several items center around the content: User research -- the library has a staff anthropologist at UR did an ethnographic survey of how students use the library (broadly, not just online). Technology -- showing new technologies to library staff. Usability -- this is the checkpoint to make sure that the technology is being applied in good ways. Design -- where the programming requirements come from. These are all interconnected in various ways.

Customization of user content (through MySite and/or Panels themes) gives a more personalized user experience. Rochester used MySite to allow users to rearrange their pages. Relies on JavaScript in the page. More interaction with server is necessary (pages aren't all the same for all users) so can increase load.


London Public Library (Dave Mitchell)

Picked Drupal because of cost. But got very easy customization as a result.

Modified the comment tool so that comments could exist across sets of pages, not just on a single page -- so that, for example, election information comments and questions could appear on all government-related pages as a single thread.


Nick Ruest (McMaster Library)

Library's Digital Collections. Drupal isn't an out-of-the-box digital collections tool, but Drupal's CCK allows for the creation of Dublin Core metadata set.

OAI-PMH & CCK: The site has been harvested by several OAI-compliant harvesters, putting digital content into broader access.

Thunder Talks -- Access 2008

Thunder Talks are brief (4 minutes 30 seconds!) talks on any subject the speaker wants to talk about. Without further ado:

BiblioCommons

Rolled out at Oakville (Ontario) Public Library. Live implementation of the tool (the research leading to BiblioCommons was described in one of yesterday's post. Lesson learned: Don't roll out a JavaScript-heavy site on IE 6 browsers. In a few weeks, have received thousands of ratings, reviews, and lists.

OLE -- Open Library

OLE is a framework for libraries that support research, teaching, and higher education. Led by Duke University. By July 2009 there will be a completed design document. Focuses on design, not software development, but expects that a follow-on implementation phase will happen.

How To Adopt Open Source

Things they've done recently in University of Prince Edward Island library catalog. Since there was no reserves module in Evergreen, they used book bags to gather resources and use RSS to populate subject guides. Added linkable subject terms. Switched thumbnail images from Amazon to Google Books API. Also added a tab in the catalog record for "Excerpt", which pulls an excerpt of the book from Google Books.

What's Happening in Saskatchewan

One system for all public libraries to share resources. Single library card, no ILL -- just place a hold. This project was going well, and then it wasn't when the budget went away. But progress was still being made. Libraries created consortia, did an RFP, and are moving forward. A consortial integrated library system (CILS). Start of a three-year process.

RSS Feeds for New Books

Based on Doran's New Books List for Voyager catalog. A 3 element table: Use a single LC class, a multi-character LC class (A, B, and J), or ranges of call numbers (AB 123 to AB 155). Script outputs both RSS and HTML. They can be mixed and match to pull together a new books list that matches specific patron needs.

See KSU's New Books Feed. Custom feeds are possible -- for KSU staff only -- but requires that patron enters request in words, library staff translate into LCSH speak. But custom feeds are underutilized. Only 10 custom feeds, mostly for librarians, so far.

Koha in a Small Public Library

Hanover, Ontario, public library wanted to go in with a small number of libraries (14) nearby. Couldn't find a Koha support vendor to do the implementation the way they wanted. So they're doing it themselves. Building on virtual machines to allow flexibility -- in implementation, to add new libraries.

Fedora Drupal Module

University of PEI built a Fedora module for Drupal. Will be open-sourced "very soon." Have several content models. Makes ingest very easy -- to solve the basic problem of many repositories: nobody contributes because it's hard. Have an example of a RefWorks collection -- to ingest citations and, if allowed locally, if it's allowed to put in the full text (if ROMEO allows it). Metadata are editable in Drupal but stored in Fedora.

Zotero Connection to Evergreen

If you have the Zotero Firefox plugin, you see Zotero cues in the Firefox URL bar. It then lets you select items from the results list into Zotero. Accomplished through a LINK item in the document head that points to the unAPI service that Zotero uses.

Dashboard for Library Information

Summary information for administrators. It shows trends -- not individual items, but gross numbers over time that can help administrators understand what is going on. There's a back end that allows users to easily create widgets, pulling together numbers from various systems. The tool allows easy creation of these widgets, but the data are entered by hand. A storehouse for library data with graphical presentation.

Drupal Module for ContentDM

A Drupal module built at Simon Fraser University to search ContentDM through Drupal for the Multicultural Canada site. It uses the ContentDM API and the Drupal API, or "one big 'appy family."

Mashing Up and Remixing the Library Website -- Access 2008

Karen Coombs

University of Houston

A couple definitions: Mashing up -- taking data from different places and shmooshing it together. Remixing -- rethinking the way we do the library web site.

University of Houston library web site, 3 or so years ago, had 1500 pages of content all managed centrally. Needed re-architecting for the same reasons we have all experienced: Staff have a wide range of HTML and computer savvy -- running the whole gamut from technology illiterate to programmers. Library web sites incorporate information from lots of different sources. There's lots of redundancy of data -- the same information appears on many different pages. Library users don't come (or wish to come) to library site -- there's a need to get library information where patrons are. And finally, library information is not well integrated into the curriculum.

The ACRL information literacy report reinforced that library instruction needs to take place as part of the curriculum, not as an add on. So -- in the classroom, in the Course Management System.

The traditional source has been to implement database-driven sources, skinning pages to look like each other. What Karen wanted was an easy to use system with little Web Services (her department) intervention. Content that can be easily mixed, reused, and shared within and between systems.

Took inspiration from iGoogle, Drupal, netvibes, and WordPress. Liked content types, widgets (easy drag-and-drop customization).

In the system built there, content owners are responsible for content organization and metadata. Librarians have responsibility; supervisors have review role. A librarian owns pages but also items (building blocks of pages).

Use many tools: LibraryFind, WorldCat, Archon, Serials Solutions, flickr, WordPress multiuser.

The site is completely modular -- librarians can add modules to pages, organize them how they want. Fonts and colors are controlled centrally, but layout is up to the content owner.

Content is remixable. This means that content can be used elsewhere inside the site, but also used outside the site. The external uses aren't quite ready -- the API is still in development. But they do use microformats, so if you view an event, the event is stored as an HCalendar microformat. Contacts are HCards, to be added directly to visitor's address book. Flash objects can also be embedded into a web page.

Virtual Subject Library

This is essentially a subject guide. It brings together federated search, new books, relevant databases, subject liaisons, text blocks, etc. Sample (may not be permanent -- this link is on a test server).

Karen walked us through a quick demonstration of building a page. It's efficient and elegant -- assuming that all the resources have already been added. An advantage to this is that there are no broken links. Or if there are, they are found and corrected very quickly since there is only one database entry. Most staff built their libraries in the course of a week -- but it took more time for them to conceptualize what they wanted to have on the page and where to put it.


Content Creation and Management

Tab editor works similarly. Through the admin interface, a librarian provides the names of the tabs, gives each tab a type, sets the order, and then (for each tab) makes it do the right thing. A search tab is configured to search something. A text tab is configured to display text. And so forth.

An interesting add-on is that the link tool in the basic text editor takes the librarian to a search interface to all the links in the database. So an existing link can be added (and is therefore recorded once, used multiple times). New links can be added if the link is not already in the database.

Since the site is modular, it is easy to replace functionality. Google Calendar could replace the current events tool. One staff directory could supplant another.

Next Steps

It needs an API -- badly. Both to integrate content into external sites and to improve internal AJAX content which will run much faster via API than via direct database query. While images and video can be uploaded easily, it's not easy to get them into a text block, for example -- so this part is only half done. Integration with WorldCat -- put bibliographies from WorldCat into a page.

Staff interface is very personalized. User interface is not, yet. A main impediment is that university information systems do not offer easy ways to get data about students (role, major, etc.) Bring federated search results into context of web site.

Questions

Q: How does this work with accessibility?

A: The site, as seen by the public, is very accessible. The staff interface is not accessibility compliant yet -- nor is it cross-browser compliant (FIrefox only).

Q: Can you talk about use cases for the API under development, and demand?

A: The first use case is internal users -- to make current actions based on database connections will be faster via an API. Also, want to be able move things between Intranet and public site -- some content belongs in both places. Desire to put library content into University web site. Courseware is difficult because UH is a WebCT site and that's not particular API-friendly. An API would make building applications for mobile devices easier.

Q: Could you talk about development effort?

A: Been in the works for about 3 years. It's written in Cold Fusion. This was because old site was in that, and they didn't want the site frozen. They could replace parts as they went along. One developer FTE for one year built it; they now have two developers working on it.

Using WorldCat Grid Services in Library Applications -- Access 2008

Roy Tennant

OCLC Research

Grid Services -- a set of APIs to access WorldCat data. It's not for human consumption, but for machine-to-machine communication. We'll talk about a few services, with a few demos.

Many of APIs -- OCLC's and other's -- are collected at TechEssence.

These APIs are available for free to all OCLC member institutions.

Identifier Services (xISBN, xISSN)

xISBN If you have a book ISBN, OCLC will return a list of all the related works (in a FRBR sense) to that book. It's a way to find a different edition, a different version, of the same book that might be in your catalog. Particularly helpful if someone comes into your catalog via Amazon or another bookseller -- such as via the LibX Toolbar.

xISSN If you give this tool an ISSN, it sends back a chart showing the history of that ISSN in all its splits, merges, renamings, and everything else that has happened. Example.


Registry Services (Institution Registry)

Any institution can enter information for itself in this registry. It includes all sorts of things -- hours, administrative contacts, root URLs for link resolvers, OPAC, and so forth. The WorldCat Registry also has an API.


Experimental Services (Terminologies, Metadata Crosswalk)

Terminology Search for terms (broader, narrower, related) in various taxonomies: FAST, LCSH, MeSH, etc.).


WorldCat Search API

This is the flagship OCLC API, released in August 2008 after an 8-month beta/test period. More than 80 institutions are signed up to use it. There are 110 million records, representing 1.3 billion holdings. This API supports OpenSearch and SRU. Responses come back in flavors of XML: RSS, Atom, MARC21 XML, and Dublin Core. JSON may be coming soon. It's RESTful. Many indexes. Sorts by relevance, author, title, date, libraries that hold it. It can return standard citations (APA, Chicago, Turabian, MLA, Harvard).

VuFind: The Library OPAC Meets Web 2.0 -- Access 2008

Andrew Nagy

VuFind Developer

Villanova

Introduction

What is a "next generation catalog"? The term is not Andrew's favorite -- he wants to get away from the word "catalog" and start talking about "resource discovery." A different approach between librarians and users: we view things as known-item searches, users have no idea, generally speaking. We need a tool to facilitate browsing, sharing, and organizing resources.

Users go to Amazon, Google, and Delicious to find and discover, and then look in the catalog. Libraries should be in that discovery role. Definition of "catalog" should include things that your users have access to (as through consortial borrowing or online access).

Villanova decided to turn product into an open source one because they wanted broader development base and broader help for making it better. It took about 2 months to get university approval to open source the software. Over next two years, development continued at VU and elsewhere.

Many institutions are in process of adopting VuFind -- alpha, beta, and live. VU's catalog is tightly integrated with the web site.

Browsing is important, along with functions that exist in Amazon, Barnes & Noble, elsewhere -- both online and in physical stores. Bring in demographics -- show sophomores, for example, what other sophomores have looked at in a given topic search. Ability to save and share searches is also important.

Villanova found that, in the physical library, students were confused by multiple service points (reference, information, circulation, etc.). They've combined the physical desks into one -- and feel very strongly that students would do better with a single point of service via the web. Catalog, web, digital collections, search -- should all be integrated.

Tool must integrate directly with the ILS -- LDAP & SIP2 authentication, bring in live circulation status, display holdings data, and -- most importantly -- interoperate with major ILS systems.

Data migration -- VuFind community built SolorMARC to import MARC data into SOLR. You specify the mappings of MARC records into SOLR, the way you want. Investigating OAI Import and (possibly) Z39.50 import.

Search and Browse

A VuFind search (see the VuFind demo) for a phrase like world war two (no quotes) does an "and" search first, and then an "or", so it searches for all the words, then any of the words. For phrase searching, use quotes. You can narrow results using the facets VuFind returns. Author facet for a general search is not particularly helpful -- for a broad search, some authors always show up (Shakespeare shows up in the world war two example, and they clearly were not writing about World War II. Nagy is thinking of ways to have facets display in a different order depending on the kind of search being done. Suggests removing author from "very large" systems (those with millions of records).

An author search in the demo brings up a list of matching authors before the search results -- so users can disambiguate. Clicking on an author's name from this list pulls up the Wikipedia entry for that author, so that users can verify which of the similarly-named authors they meant.

There's a experimental browse tool that lets you navigate through the catalog, iTunes-style, without typing a keystroke, to get to a collection of books on a topic. This tool only shows top 50 results in the last panel -- so it leaves out a lot of things, even in a smallish catalog like the test catalog (with 850,000 records).

Questions

Q: What are weighting systems in the search process?

A: In general, the all fields search gives more weight to title, exact matches, author, call number, subject headings. It's not currently configurable, but it's in the code and you can play with it there.

Q: Where is VuFind in development process for bringing in non-catalog materials?

A: Villanova has a lot of digital library material. But haven't brought in non-MARC records. Want search results to show thumbnail of digitized object. A record display would include different types of data. This is probably the next step. VuFind 1.0 is next step. Bringing in other content through OAI and/or federated search is likely the next step.

Q: How frequently is bibliographic content?

A: Villanova updates/adds/edits/deletes about 150 records a day, on average -- they update nightly. It can be done more often, if wanted/needed. Their Voyager ILS does a nightly update of the previous day's changed records. Deletes and suppressed records are separately output and removed from VuFind nightly.

Q: Have there been any III libraries?

A: None public, but quite a few are in development. Holdings are currently available only through screen scraping. Sirsi-Dynix is also under development.

Q: Internationalization?

A: Yes -- the whole interface has been translated into about 10 languages.

IL2009 Keynote: Vint Cerf

I'm recording my mostly unedited notes from Internet Librarian 2009 here.

Vint Cerf was interviewed by Paul Holdengraber -- the conversational format made for an entertaining and informative session. I'll comment on one particular point below (where Vint talks about cloud computing); the rest is presented in raw form.

Business interests don't like the idea of the web from a copyright perspective: every time you look at a page, you *copy* it.

How should we rethink copyright in the internet age? Copying and distribution is very inexpensive. We still want people to be able to make money and protect work. Creative Commons does this very well. We need flexibility to have free content and paid content.

Email started out as a tool of convenience for programmers. Quickly moved to commercial world through 80s. Commercial email died when it was connected to the Internet (as opposed to proprietary, closed, systems).

What is impact of email, in terms of attention span, interruption. Kids multitask a lot -- and they may be training their brains to do so more effectively than we can, because they are learning to do it earlier. But we tend to spend less time reading and thinking before acting. We aren't necessarily paying enough attention to problems, issues, as we did in the past. (Cites conversation with Henry Kissinger, who posited this.)

What does it mean to be attentive in an age of distraction? "Power corrupts; Powerpoint corrupts absolutely." Brevity is the enemy, in a sense, of careful thinking and analysis.

Tell us about 'bit rot'. Vint is concerned that, as we build more digital archives, the archives will not be meaningful if the applications are not available to render those bits. Old files may be perfectly valid, but not interpretable. It's not just a question of preserving the bits. You can't just preserve the software; it's often proprietary. And it's not just the software, but the operating system. And the OS won't work unless it's on the right hardware. How to address this set of problems?

Cloud computing may turn out to be helpful; older programs may be available in the cloud, not personally/individually.

This is where I think Mr. Cerf hinted at where Google is going. With its massive computing capabilities and its experience in the cloud, my guess is that Google aims to make old computer files available via the web. This would be a step or two in the direction of making the world's information available to everyone -- including those musty, bit-rotting files that are already hard to read. With virtualization, Google could well provide access to those old files you created in 1993 on your first PC using a now-defunct Windows 3 application.

Ebooks have very different rights structures than printed books -- all you can do is read it; you can't sell, loan, destroy, etc., the ebook.

Books will evolve if for no reason that they're static. Our world is dynamic. We need some form of electronic book. Much of what we will want to use are not usable in static printed form (spreadsheets, models, videos, etc.)

We should stop using the word 'teach' -- we need to focus on 'learn', how people learn.

Does Internet inspire passivity? No -- Not at all. Idea of the Internet is open -- it allows people to try things out. Openness inspires activity. It unlocks creativity.

Internet was designed, not invented.

What are your thoughts on neutrality of the 'net? Broadband access is increasingly important, but we have limited sources of broadband access. Often have zero or one choice; sometimes two. Broadband providers have a potential motivation to favor their own applications, not their competitors. Need to make sure the pipes carry everything equally, not play favorites.

What about privacy in the Internet age? Privacy is expected by most people on 'net. And yet... there is monitoring going on. Tension is between anonymity and privacy, and law enforcement and protection of society. Anonymity is important. Strong authentication is important where it is needed.

Challenges for next archivist? Bit rot is huge challenge.

Internet looks like postcards. Explain. A postcard is like an Internet packet. You put it in the system, and it may or may not get to its destination.

What is Google Wave? What would world look like if email, IM, chat, tweets, etc., were all in one system? It puts all the comments on a 'wave' (a conversation), participants see it where they want (blog, email, Wave, etc.). Highlights problems of ephemeral and permanent records. Things you might think were ephemeral might end up in Wave.

We now live in an environment like the global village. Abberation of the Industrial Revolution is ending. We're now back when we all lived in small villages and knew everything about everyone. (See Transparent Society, David Brin)

IL2009 D101: Digital library Network -- Roy Tennant

Unfiltered notes from Roy Tennant's talk on Digital Library Network:

Perceptions report -- libraries = books

Libraries were once center of information universe. Many online catalogs are simply card catalogs on screen. Libraries were built around the idea of scarcity. World today is not this way. Even in developing world -- form of internet access is cell phone (more ubiquitous than computer).

Tablet devices are on the way, and soon. Epaper is comings oon, too.

Users built workflow around libraries. Now, we need to build ourselves around users.

Massively centralized services not possible. Now, this may be our only salvation.

Quotes General Shinseki: if you dislike change, you're going to dislike irrelvance even more.

We need to put libraries at the network level of web scale.

We need to be an essential part of the new ecology. Whole publishing indusry uses Onix standard for bibliographic data. Libraries use MARC. Does anyone find this funny? Libraries don't just process metadata. We add value, put in more data based on what we know about books.

What is going to save our bacon? It's not what we've been doing for last century. Research process is broken -- messy desktops (virtual and real). Libraries have metadata to help researchers organize and find the information they need. We have to help people with their problems. Libraries need to be the solution.

Take a look at CDL's escholarship.org. Changed their IR into something more dynamic. Now a publishing tool for faculty. IT's not an IR anymore -- it's a publishing service. IR is the backend, nobody cares. It's all about publication and citation.

CDL is also archiving state government sites. See what the world looked like around pivotol events. Crawls state government sites twice a year, some more frequently. Also make tools available to others.

Make sure your site is indexed by Googl, etc. Your site must make its content available. Syndicate your content in places where it can be found.

Libraries need to create conversations, be the locus of ideas and discussion.

Question about future of federated search: is it basically dead?
Answer -- yes, it's time has basically come and gone, if we can build some of these new services like Summon, etc.

IL2009: When Students Go Mobile

When Students go Mobile: The Effects of Smartphones on Information Literacy and Academic Library Services
Kristen Yarmey-Tylutki

Students with smartphones think of them as leisure devices, not academic devices -- only 11% in a recent survey. But this is likely to grow as libraries build more smartphone-ready content and services.

2000 ACRL Information Competency Standards. Are these standards still relevant? Five standards -- where are we on each one? How do smartphones effect information literacy?

Standard One -- determines the nature and extent of the information needed
Should we provide access to high-value applications just as we provide access to high-value databases? For example, Netter's Neuroscience is $40 for the iPhone; should we be licensing it for our populations?

When others create applications using public information, who validates it? For example, there are two apps for the the World Factbook 2009. Neither produced by the government. What are risks?

Standard Two -- access information effectively and efficiently
Extracting information from sources and managing that information. Harder to do on mobile devices. But iPhone is beginning to do this pretty well through applications -- such as Margins.

Standard Three -- reads text and extracts main ideas
Mobile devices allow students to read on the go, between other activities. Enables on-the-fly reading -- but what about deep reading? Will they be able to process what they read? Should smartphone makers make a 'quiet setting' so that the user can't be interrupted?

Students may see information that renders well on a mobile device as more accurate than information that renders poorly. What you can use is better than what's hard to use (i.e., the Google Scholar effect).

Librarians should help ensure that course management systems are easily accessible via mobile devices.

Standard Four -- students can use information to solve real-world problems
Students need to be able to share information. The apps, like Dropbox, exist, but aren't well known.

Standard Five -- students understand the rules around information use
Libraries need to help educate students on the potential impact of posting information on the web. We could help teach how to be safe online -- or help others on campus do this.

Plagiarism is still a tricky subject. But students are now in such constant contact, all the time, about their academic pursuits that collaboration is a way of life, not something one does in a course context. Figuring out what's your own, and what's a group effort, is harder.

Summary -- standards have held up pretty well in the face of mobile computing.

Speaker interested in doing research on literacy and mobile devices.

Kristine Ferry, Lisa Serbert & Holly Tomren, UC-Irvine


Institutions that are building mobile apps don't often include libraries in their mobile apps.

If we're going to collect applications for mobile devices, we need to think about a few things. We need to know more about their behavior. As vendors provide mobile-ready content, will libraries be charged more? So far, not -- and shouldn't be. It's the same users, same content -- different destination. Will mobile devices need special activation or authentication?

MIT has a proxy application for iPhone, to proxy specific resources.

Add the mobile version to the catalog record -- another 856 field. Simply another location for the device for mobile users.

How do we support information for mobile devices? A multitude of devices and formats. Who gets this task? Library? IT? Vendor?

Why not use mobile devices for library cards? You can check into a flight with one.

I really like this last idea -- if American Airlines and Delta can send you a barcode to display on your mobile device, can't the library do the same thing?

Internet Librarian Thoughts

I attended my first Internet Librarian conference this past week in beautiful Monterey, California. While my blog posts were infrequent, I soaked up a lot of good information from the presenters.

Wednesday morning's session with a panel of three 'born digital' students was fascinating. Why I found it easier to grok this generation's approach to technology from hearing it from their mouths, rather than reading Pew Internet reports or the work of danah boyd, I'm not sure. I was pleased to hear the members of this panel state how they understood the differences between 'any old online resource' [my phrasing] and the 'good stuff' [again, my words] libraries provide. I was entertained and a bit amazed at what I understood to be their attitude about technological innovation: that the speed of evolution in how we communicate and interact with the world around us is normal and unending. It makes me feel older than my years to hear eloquent and thoughtful high school students hold forth on the normalcy of technologies that I find, frankly, amazingly innovative and cool. I was struck by one of the participants' statements that "twitter is dead"; amended by the youth librarian who helped convene the panel, who added "twitter was never alive." The fact that this quote was repeated many times in the following minutes and hours on Twitter struck me as entertainingly ironic

Mobile devices and mobile computing were a focus of the conference as well. It seems that there is tremendous energy in libraries toward making services and functions available to the handheld devices. In not too many years, mobile devices will be the de facto standard of internet access, the one everyone has -- not a computer. (This will be especially true in the developing world, which will bypass landline networking much as large swaths of the world have bypassed landline telephony.) An important point to remember is that the world does not use the iPhone, as much of a phenomenon it has been in some places. Even if the world is using mobile devices, they may not be using full graphic interfaces on those devices. Does this presage the reappearance of more gopher-like interfaces, ones that are much simpler to navigate on small-screen devices?

The Web Presence & Experience track was filled with excellent examples and advice on web design. Presentations covered a range of topics. Refreshingly, it seemed that everyone assumed that usability would be taking place -- it is simply part of the process, not a super extra-special tool that only some people use. Innovation centered on services and enabling functionality, and much less on user testing and validating of designs. This is an excellent step.

Another interesting thing that I noticed is that almost nobody mentioned "RSS" as a tool or technology that needed to be explained. As I noted in my post about the lack of RSS support in Google Chrome last month, RSS is becoming invisible plumbing, something that just happens and is assumed. It appears less an active tool, and more a passive way to exchange information. Twitter, on the other hand, is alive and well. At least, among the Internet Librarians in attendance. As I publish this, there have been about 500 tweets with the #il2009 hashtag, from (in my estimate) 1,000 conference attendees.

And finally, a sign of the times.... In the lobby of the Monterey Marriott was the following sign, which made me wonder: Is there any other kind?

Internet Librarians

Meta

This page lists (oldest to newest) items posted to the Conferences category.

RSS Feed

RSS4Lib is written by Ken Varnum. Contact Ken.

Archives

Creative Commons License
This weblog is licensed under a Creative Commons Attribution Non-Commercial License.