Matrix News

Crowdsourcing Social Research

The capability to democratically distribute tasks within a directed project presents novel possibilities for researchers—and social scientists at UC Berkeley are taking advantage.

Screen shot 2015-12-17 at 2

When the first modern research universities arose in nineteenth-century Germany, they received international acclaim for their innovative method of instruction, the seminar. The departments of philology, often at the cutting edge of humanities research at the time, pioneered the use of these focused gatherings of elite scholars and students. Their model persists as a hallmark of humanistic and social science research to the present day.

The Perseids Project, an initiative of the Perseus Digital Library affiliated with Tufts University, inverts the centuries-old seminar model. Harnessing collective knowledge and interest, Perseids and Perseus develop translations of classical texts and social networks of mythological characters that are constructed with the help of “citizen scholars.” Volunteers from across the web contribute philological insight and suggested translations to produce interactive digital editions. Through this and an increasing number of similar projects, the philological seminar meets the novel method of scholarly “crowdsourcing”.

Representatives from the Perseids Project joined scholars from UC Berkeley and other institutions at the “Crowdsourcing and the Academy” symposium, which UC Berkeley’s Humanities and Social Science Association (HSSA) hosted in November. This conference provided a space for scholars working across disciplines to present their work and discuss the incorporation of crowdsourcing methods in social science and humanities research.

Since Jeff Howe first coined the term in “The Rise of Crowdsourcing,” a 2006 article in Wired magazine, the concept of crowdsourcing has become synonymous primarily with using digital platforms to enable many people to contribute to a single task. For example, Amazon’s Mechanical Turk is a service that lets users break down complex projects into basic tasks to be completed by anonymous freelancers for payment. Other commercial crowdsourcing sites include Threadless, which manufactures t-shirts that are designed by volunteers and vetted by an online community.

The creative use of crowdsourcing for scientific research long predates the Internet, of course. Great Britain’s Longitude Act of 1714, which provided substantial prize money to any enterprising individual who could successfully devise a method of reliably determining longitude, represents perhaps the most famous example of pre-digital scientific crowdsourcing. Thus the digital crowdsourcing platform Innocentive, through which firms distribute rewards for solutions to technological queries, enjoys an established historical precedent.

Presenters at the Crowdsourcing and the Academy conference provided a glimpse into the promise of this method for science and scholarship. For example, Dr. Nick Adams, a sociologist and research fellow at the Berkeley Institute for Data Science (BIDS), described how the crowdsourcing of “content analysis” tasks is central to Deciding Force, an initiative dedicated to analyzing and categorizing more than 8000 news articles on police and protester interactions associated with the Occupy Movement. While this phase of the research might have traditionally required an unwieldy amount of time for training research assistants to read, extract, and categorize relevant excerpts from the corpus, Adams is currently developing a method to reliably streamline the process by incorporating “crowd workers.”

Researchers who advocate for crowdsourcing are still wrestling with potential ethical dilemmas about the compensation and acknowledgment for anonymous online workers.

By chunking bits of text into meaningful units and paying crowdworkers to categorize them within an established schema (with a variety of quality-control measurements built into the process), Adams is confident that he can reliably analyze a massive corpus of texts in comparatively little time. He is in the process of developing a platform called “Text Thresher” that will provide crowdworkers with a user-friendly interface to analyze texts, and will allow his team to test their work against their own standards. He plans to release Text Thresher as an open-source tool available for use by any interested researchers in 2016.

While crowds can provide dispersed and efficient labor for social scientists, they can also be tapped to test hypotheses. For example, Dr. Marti Hearst, a professor at the UC Berkeley School of Information—together with Professors Bjoern Hartmann and Armanod Fox—uses Amazon’s Mechanical Turk to study the potential pedagogical benefits of structured problem-solving in small groups, particularly for students in online courses. Through this project, Hearst and her team are engaging crowd workers to answer reading-comprehension questions and discuss their answers with other “workers” online. The researchers are providing incentives for these workers to help one another, to determine if it will help them earn more correct answers. Her conclusions illustrate how providing outlets and incentives for crowdworkers to collaborate can help them achieve a greater degree of accuracy.

The results of this project will not only inform Hearst’s work, but they also will contribute to ongoing debate about the dependability of crowdworkers  and whether this method meets the rigorous standards of scientific research. Indeed, while both Adams and Hearst outlined comprehensive methods for dismissing responses from uninterested and unprepared crowd workers, questions remain about the extent to which crowdworkers could, and should, replace trained student research assistants.

Researchers who advocate for crowdsourcing are still wrestling with potential ethical dilemmas about the compensation and acknowledgment for anonymous online workers. But while the scholarly community is still in the early stages of working toward practicable solutions to these methodological and ethical questions, as more scholars join these discussions, they are developing best practices to lead future work—all in keeping with the spirit of crowdsourcing.

You May Like


Published December 9, 2015

Juana María Rodríguez: “Statistics and Queer Theory”

Professor Juana María Rodríguez, from UC Berkeley's Department of Gender and Women’s Studies, suggests that scholars in the field of Queer Studies would benefit from a turn to statistics as a lens into bisexuality and other identities.

Learn More >

Grants and Opportunities

Published October 4, 2015

Hanks Receives Staley Book Prize

Congratulations to William F. Hanks, UC Berkeley Distinguished Chair in Linguistic Anthropology and Director of Social Science Matrix, for receiving the 2015 J.I. Staley Book Prize, one of the most prestigious prizes in the field of anthropology.

Learn More >

Matrix News

Published September 24, 2015

Fall 2015 Matrix Seminars Underway

Covering topics ranging from climate change and race relations to polarization in Europe and the study of metaphor, a new series of interdisciplinary seminars are underway at Social Science Matrix.

Learn More >