Poverty and underdevelopment often go hand-in-hand with a lack of data, as regions lacking infrastructure or mired in violent conflict are often unable to conduct surveys and compile statistics about populations. In Afghanistan or the Democratic Republic of Congo, for example, a census has not been conducted in decades. And while in the U.S., fitness trackers and Google Maps track people’s most incremental movements, in Africa, many births go unrecorded, and online maps show only major cities and streets. As a result, vital information about developing nations—such as poverty, health, and unemployment levels—often can only be estimated. Even initiatives such as the UN’s Millenium Development Goals, which focuses on child poverty, health, and education, have struggled to accurately measure whether targets have been met.
In recent years, however, researchers have begun using digital information to gather information about inflation, poverty, migration, disease, and other factors in developing nations, particularly as mobile phones and other technology have gained adoption. In West Africa, for example, researchers drew on call detail records (or CDRs, which show a call’s time and duration, as well as the location of the cell tower that routed it) to estimate population movement during the Ebola outbreak. In another case, the Humanitarian OpenStreetMap Team (HOT) hosts “mapathons” to identify streets and buildings from satellite images of the Central African Republic and other countries, with the aim of improving development projects and disaster response.
Of course, as useful as these resources are, the aggregation of this information also carries risks to privacy, as well as the possibility of creating a new “digital divide” between those who can gather and study the new data and those who cannot. This divide is exactly what Emmanuel Letouzé, a PhD candidate in demography at UC Berkeley, is working to prevent. “I don’t believe in technocratic solutions,” Letouzé says. “It has to be deeply political, as much about the why as the how.”
In addition to writing a dissertation on big data and demography, Letouzé serves as director and cofounder of Data Pop Alliance, a think tank created by the Harvard Humanitarian Initiative, MIT’s Media Lab, and Overseas Development Institute. Based in New York, Data Pop Alliance advises developing countries on working with large data sets, supports research projects, educates the public, and assists with conferences, such as a recent Big Data Bootcamp held by the United Nations Population Fund.
“It’s about training people, giving the ability and the willingness to understand this new world of data and how it’s used,” Letouzé says.
“Big data” refers in part to the traces of activity that humans leave behind on digital devices. This includes CDRs, credit card transactions, subway records, and other structured data, as well as unstructured data, such as blog posts, tweets, videos, and other social media, that is less quantifiable and harder to analyze. A broad array of other types of information, including data from satellites and electric meters, weather information, and digitized books, also fall under the “big data” umbrella. Since 2012, the world has produced 1.2 zettabytes (or 1.3 trillion gigabytes) of digital data every year.
However, such mind-boggling numbers can be misleading, Letouzé says. “One thing I really try to advocate is that “Big Data” is not just [large data sets],” he says, noting that the term is better understood as a “complex ecosystem,” comprising not only data, but also powerful new computers and the people and institutions who use them.
It’s about training people, giving the ability and the willingness to understand this new world of data and how it’s used.
Letouzé—who holds degrees from Sciences Po and Columbia University—helped launch Data Pop in 2014 after becoming frustrated with his work as a technical consultant for other organizations, including the OECD and the United Nations’ Global Pulse project, which works on big data and development. “I decided that I wanted to build something of my own,” Letouzé says. He partnered with Patrick Vinck, the director of Harvard Humanitarian Initiative’s Program for Vulnerable Populations, and the prominent MIT data scientist Alex “Sandy” Pentland, who serves as Data Pop’s academic director.
Letouzé is currently leading a project to assist Colombia’s National Statistical Office (DANE) on using big data to measure poverty and crime in Bogotá. By combining CDRs, bus traffic data, and official crime reports, Letouzé and other researchers are building a predictive model of where crime is likeliest to occur in Colombia’s capital; they are examining whether buses’ location and frequency—and the number of passengers getting on and off—has an impact on crimes, such as homicide and sexual assault. Based on the results, they will work with the government to create policy suggestions, such as altering bus schedules.
Another project affiliated with Data Pop uses the Google Earth Engine to estimate communities’ vulnerability to flooding. A separate joint effort with the Qatar Research Institute studies millions of tweets to gather information about poverty and inflation patterns in Egypt.
Letouzé notes that data-based social-science research faces a variety of issues, including selection bias (some people are more likely to own cellphones than others) as well as privacy concerns. In a recent paper on the ethics of using CDRs, Letouzé advocates some alternate approaches, such as using phone records together with other types of data, and keeping an “expiration date” on individuals’ information. The Bogotá study seeks to correct sample basis by incorporating surveys and traditional statistics, uses aggregated data sets to obscure individuals’ identities, and looks for ways to provide local access to information.
Although mobile phones are driving the Internet’s expansion in low-income countries, offering new possibilities of gathering information in areas where official statistics are lacking, Letouzé stresses that public debate is essential to making sure data empowers people, rather than creating new disparities. He explains that Data Pop is focusing on emerging legal questions, such as “getting people the right to their data” by helping craft laws and guidelines about how and when it can be used. For example, while non-anonymized data might be deemed essential in crisis situations to help reunite families or locate remains, regulations could require that data can only be shared in aggregated sets to protect privacy. Data Pop is also working to promote data literacy, by leading training programs in data and development for journalists, official statisticians, and NGO staffers, and supporting trainings in Rwanda and other countries.
Letouzé points out that the new world of data research is deeply interdisciplinary; most of his papers are joint ventures with scholars in different fields. “You can have someone with training in anthropology, econometrics, and computer science,” he says. “There are very few people who can do everything themselves with big data.”
In addition to his work as a demographer, Letouzé is also an accomplished cartoonist who draws under the name “manu.” He often uses cartoons to explain data and development. “I’ve been drawing since I was kid, but unlike most people I never stopped,” he says. One of his recent cartoons depicts a Mexico City study in which cell phone data was used to predict socioeconomic levels, with a squiggly character in a blue shirt going through the creation of a predictive model step by step. Another satirical drawing depicts big data as “the new oil”: a voice hovering beneath looming black oil barrels asks nervously, “This is a good thing, right?” (Click here to see a gallery of Letouzé’s cartoons satirizing the use of big data.)
“There are no moments when I think in terms of research and social sciences, and then think, ‘Now I’m gonna relax and do some doodling,’” Letouzé says. “I’m always using the same parts [of my brain].”