User Supplied Image Category Labels
Study’s goals were to identify underlying structure of image tags. Analyzed 105 participants’ labeling of 100 images. Images tagged and organized into groups. Identify a prototype image in each group. Identify significant feature of prototype image.
Example of hierarchy: furniture (superordinate), chair (basic level), kitchen chair (subordinate). The basic level has more distinctive properties than superordinate, but isn’t too specific.
Out of the 899 category labels applied, ~58% were superordinate, ~38% were basic level, and ~4% were subordinate. Interesting — it was thought that basic level would be most common.
A group of people displaying emotional behavior was grouped as “emotions”; facial behavior was prototype. Categories can be built around prototypes; for any category there is likely to be a single prototype. Familiarity, culture, environment effect selection of prototype.
Superordinate terms and significant features of prototype image are important in indexing. Retrieval and browsing: grouping facilitates browsing.
Social tagging: group labels tend to be superordinate. Individual images in that group tend to be tagged non-hierarchic related terms. Associations, not hierarchy. There is not much structure (does this matter? unclear). First tagger influences subequent taggers. Perhaps first tag should be done by an expert, to subtly guide future taggers.
PhotojournalsmAndUADs geotagged:ASSISST2007MilwukeWi topresent
Yes, title is intentional.
Needs of photojournalists are different from other photographers in terms of tagging.
Photojournalists select what to photograph and to store their photos in their publication’s photo archives. Photo editors pick photos to go with stories. Also worked with photo librarians.
Where is the locus of control — internal it’s something you can control; external — blame on something outside, beyond, you. We like to have control over our pictures (they’re something we save in a disaster, we like to have them).
Photojournalists and editors were studied:
People found named objects, specific events, browsing, user-assigned descriptors (UAD), metadata as the most important. Descriptors, in general, were most important kinds of labels. Started with a keyword, moved to browsing. Like metadata-based searching.
Problems with people doing tagging — inaccuracy, errors, typos, lack of time. Need to formalize rules for tagging (somehow). tag guidelines (ie., no plurals, no compound words, etc.).
In classic info retrieval, a document representation (surrogate for document) is matched with a user query (surrogate for information need). In new world… We have huge multimedia digital librareis; not single items, but collections. Many things are not text, they are multimedia. Retrieval systems more complex to match queries and document representations. Now we’re looking at slices of information space, not documents.
User is creator, annotator, indexer, searcher, and consumer of content – all roles formerly done by authros and professional indexers. Users have their own language, not the controlled vocabulary. Rise of tags and folksonomies, not controlled vocabs.
Users’ roles change, often in mid-research. They have simultaneous multiple roles. We have to react to individuals and groups of users. MNeed a more complex information retrieval model. We have “a million typing monkeys”. We have to deal with free and uncontrolled sers’ langauge and vocabulary.
The million typing monkeys are also an opportunity. Users are wiling to contribute descriptions of ocntent. Rich data to study tagging behavior (great for researchers). Need to find ways to let user tagging inform our retrieval systems.
Probably no single model will capture whole information environment. Browsing is important feature of IR. Revise Ranganathan’s second law: Every user his/her overview of the document collection”. Still need way to get to single document.
Two tools to look at:
How do you provide access? People tag at a high level — broad terms. Best entry level in a browsing interface should be the basic level; where people search. Depth of hierarchy is a problem. Hard to display breadth of terms in a functional way.
Social tagging is an opportunity, not a challenge.
Semantics of User-Supplied Tags
Wide gap between terms used by taggers and terms used by professional indexers. There is not a thesaurus to get from one to the other — at least, none now.
Generic terms are most frequently used terms. 75% of generic terms are in formal index (LC TGM). Studied occurrence of colors as tags in Flickr and in LC TGM.
What are relationships that are most useful for users?
Tags of specific location were frequenlty used in Flicr. TGM doesn’t include specific geographic locations. But related tags don’t follow regular patterns.