Machine Learning: Applications and Opportunities in Social Science Research

Instructor: Christopher Hare, UC Davis

The field of machine learning is most commonly associated with “big data”: how we can use massive datasets to make better predictions about things like credit card fraud, Netflix recommendations, and the like. Though machine learning has been most influential in its commercial and medical applications, a growing number of social scientists are taking advantage of these methods for data of all types to: (1) uncover patterns and structure embedded between variables, (2) test and improve model specification and predictions, and (3) perform data reduction. This course covers the mechanics underlying machine learning methods and discusses how these techniques can be leveraged by social scientists to gain new insight from their data. Specifically, the course will cover: decision trees, random forests, boosting, k-means clustering and nearest neighbors, support vector machines, kernels, neural networks, and ensemble learning. We will also discuss best practices concerning tuning, error estimation, and model interpretability. Software: The course will use R to demonstrate the theoretical properties and empirical applications of these methods, and so participants should have some basic familiarity with R or similar statistical computing environments (such as Stata, SAS, or Python). An advanced programming background is not required or assumed. Prerequisites: Participants should also have some prior exposure to linear regression models.

UC Berkeley Faculty, Students and Staff are eligible for ICPSR Member pricing.

These workshops will all be held in-person at Social Science Matrix, 8^th floor Social Sciences Building, UC Berkeley campus or you may attend virtually.

To register and for further information, go to https://www.icpsr.umich.edu/web/pages/sumprog/courses.html and choose the “Short Workshops” tab. Or contact Eva Seto, Associate Director Matrix via e-mail to evaseto@berkeley.edu

View Map