Wikipedia: Distributed Editorial Processes
Who is wikipedia? It’s thousands of people behind the site. Lots of groups joined by shared values of openness and shared values: free content; open to all; key editorial policies (Neutral Point of View, no original research, verifiability).
How do tens of thousands of people with no top-down control write the world’s largest encyclopedia?
Wikipedia is governed by non-profit foundation. Has several sister projects — we’re only talking about wikipedia. No one is in charge of editorial decisions. Wikipedia has a modest goal: giving every person full access to the sum of all human knowledge.
There are lots of self-organized tools — for cleaning up articles, for defining NPOV, for style. Information works in wikipedia as a sum of distributed social processes and the technical structure of the wiki and culture of openness.
Technology, Theory, Community, and Quality: A Talk in Two Acts
Matching people with tasks they’re likely to do motivates contributions
The problem is that some articles need help in some way (items are tagged). These articles are listed on a community page. If you want to fix something, hard to find a page you want to fix. Built a recommender engine so that people are given pages to edit based on things that they are likely to be interested in fixing. This worked well in MovieLens. Translated to Wikipedia. Wrote SuggestBot — it goes through list of articles tagged as needing help; finds items that are similar to items that person has edited, written, etc., before.
Through wikipedia, you can see if someone edited article. Four times as many articles get edited through recommendation engine — it works. Other communities should take this approach to editing/moderating. Or match a new user in a community to an older member who talks about similar things.
Theoretical basis for this: collective effort model says lower effort = great reward. Therefore we should build interfaces and algorithms that help people find work to do.
Understanding community is huge for improving information quality
Knowing system (wikipedia content in this case), knowing users, and knowing habits all help inform the recommended engine. A failure: an automated welcome to the community to new users (people with their first edit in December 2005, about 28,000 people). Looked for people who had “welcome” on their home page. People with “welcome” messages edited more entries. However, wikipedia culture was that only good members got welcomes (bad members got warnings). But there still seems to be an effect — people with a welcome message went on to be a bit more active in wikipedia. But this is not strong.
Information Quality Work Organization in Wikipedia
Why do work organization models matter? To design effective, sound, robust models for different contexts/domains inexpensively through knowledge reuse. To establish benchmarks for analyzing and evaluating existing models.
Questions studied: How does the community understand quality? What processes exist? What are motivations of editors? What are dynamics of information objects? Why do people contribute? What IQ intervention strategies are used?
Percent of pages in wikipedia devoted to articles has decreased from 53% to 28% since 2005 — more effort is going in to talk, discussion, and so forth pages, less on articles themselves. More emphasis on community building by its users.
IQ processes: content evaluation, editor evaluation, building and maintaining work infrastructure.
Differences between wikipedia and other systems. First, user feedback and information creation are the same process in wikipedia, unlike other systems. Quality control and author of data are separate, for example, in library catalog. End user and editor roles are merged. Product creation and delivery environments are the same. Work coordination is informal and ad hoc.
Wikipedia controls quality through content and editor evaluation. Some parts of process are formal, others are informal. Because there’s little built-in mediation, disagreeing parties must come to their own agreement (or else endlessly erase the other’s contribution). Community experiments with different intervention processes when there are conflicts — trying to find the best approach at any moment.
Wikipedia Reference Desk: Processes and Outcomes
A study to evaluate the quality of processes and outcomes at wikipedia reference desk. There is a reference desk at wikipedia. It uses a wiki to process reference transactions. Users leave questions; wikipedia volunteers help users find the info they need. Organized under seven categories: computing, entertainment, humanities, language, mathematics, miscellaneous, science.
Not a lot of work in social aspects of Wikipedia community. In particular, opportunity to learn from wikipedia reference desk as a way of improving service in traditional reference desks.
What is quality of answers at reference desk? Looked at 210 transactions and 434 messages (in April 2007). In this month, there were 2000+ transactions and 11,000+ messages. Most were in science and miscellaneous categories. Most responses per question in mathematics. (Entertainment and Miscellaneous had the fewest.)
170 users (122 expert, 48 novice); 34 participated in multiple reference desks. Experts are more active at reference desk. Novices submit more questions (44 vs. 33). Novices are more likely to ask questions (70% of novices, 29% expert); experts answer more questions. By profession — computer/IT professional are plurality.
Most questions (96%) got an answer; 92% got a complete or partial answer; average time to first response is 4 hours and to last response 72 hours. Accuracy level is about 55%. Response completeness 63%.
There is question negotiation; 28% of time there’s a follow-up post from requester. There are elaborations — improved answers — 67% of the transactions. Additional resources, different point of view, different solutions, etc.
Wikipedia reference desk quality is “not too bad; can be improved probably”. Collaborative effort yields interesting results. Future study will try to compare with small groups of librarians who use a collaborative process.
Q: How did you determine accuracy of response?
A (Shachaf): Involved qualitative analysis of answers (reading them); results presented are preliminary, one-reader reviews. Final research will involve multiple reviewers.
Q: What are views on copyright of materials in Wikipedia? Is there analysis of plagiarized in Wikipedia?
A (Ayers): There should not be anything in Wikipedia that’s under copyright. In practice, hard to deal with this.
A (Cosley): There’s a tag in Wikipedia for identifying possible copyright violations.