Prediction and Explanation
On computational social science’s epistemological perspectives.
A continuous tension we see is how different stakeholders view the value of data. Some may be interested in models which give us insight into future events, while other may be interested in models which help us understand the underlying mechanisms of a process. In what ways do prediction and explaination differ? In what cases might we wish to use each approach?
- Wallach (2018): Perspective on the differences between machine learning and computational social science and why it matters. [link]
- “Observing Behavior” (2018): Discusses characteristics of big data and three “research strategies” for working with it: observations, forecasting, and quasi-experiments. [link]
- Hofman et al. (2021): Perspective on the differences between explanation and prediction and possible ways to integreate the two approaches in computational social science. [link]
Simulations and Agent-based Models (ABMs)
How can we use computer simulations to study social phenomena from the “buttom up”?
We discuss the role of simulations and when they may be useful in the development and explanation of theories or in forecasting.
- Conte and Paolucci (2014): Proposes an interdisciplinary approach that combines ABM and CSS for advancing the computational study of social phenomena. [link]
- Smirnov, Oprea, and Strohmaier (2023): A paper which integrates computational analysis with an agent-based model to demonstrate the potential impacts of its findings. [link]
Ethics and Best Practices
What are the pitfalls and potential ethical issues in computational social science research?
We discuss such challenges for computational social science in practice as reidentification, potential effects on privacy, and how more data alone does not solve study design problems.
- “Ethics” (2018): Practical examples and advice for ethical approaches to computational social sceince research. [link]
- Charlotte Jee (2019): A short article about how easy it is to deanonymize data. [link]
- Zook et al. (2017): Suggestions to practitioners for how to approach and think about ethical issues when working with big data. [link]
- Lazer et al. (2014): A classic example of “big data hubris”, where one might be tempted to ignore foundational issues just become they have access to a lot of data. [link]
Text as Data
Methods for working with text data.
A lot of social data is encoded within unstructured text. This module is more practical than theoretical and focuses on strategies to extract data from text using natural language processing and modern, vector-based approaches.
- DiMaggio (2015): Discusses the differences in how social scientists and computer scientists approach textual data and offers suggestions that may help bridge the gap. [link]
- Jensen et al. (2012): Applies textual anlysis to the Congressional Record and compares it to Google Books to deterine the relationship of polarization with “elite discourse”. [link]
Experiments and Causal Inference
How can we answer cause-and-effect questions using computational social science?
Experiments allow the researcher to manipulate independent variables and observe the effect on dependent variables. However, experiments are not always possible. Causal inference provides a framework to answer causal questions even when experiments are not possible.
- “Running Experiments” (2018): How can we answer cause-and-effect questions using computational social science? Salganik discusses ways that digital media can facilitate experiments and how causal experiment design can be used on existing data. [link]
- Grimmer (2015): Grimmer argues that approaches from both computer science and the social sciences are needed to use big data toward solving large problems. In particular, this paper emphasizes that description, while underappreciated, is still an important part of the scientific process. [link]
- Chandrasekharan et al. (2017): Applies matching to a dataset of Reddit activity to evaluate the effectiveness of Reddit’s quarantine policy. [link]
Network Analysis
Much social data is produced in the context of networks of relationships. This section introduces the basic concepts of network analysis, and provides a few examples of how it is used in the social sciences.
- Dodds, Muhamad, and Watts (2003): Uses email to replicate the “small-world” experiment of Milgram (1967). [link]
- Barberá et al. (2015): Uses Twitter data to study the influence of “peripheral” participants on the spread of social movement, finding that despite being less active, such users can be just as important as “core” users. [link]
- Bail et al. (2018): Do people tend to become less polarized when exposed to opposing views? This study paid active Twitter users to follow a bot which reposted opposing viewpoints. It finds that exposure to opposing views does not reduce polarization, and in fact can increase it. [link]
Crowds and Communities
A lot of social data is not produced in isolation, but rather in the context of communities with their own norms and practices. We discuss how to think about communities and crowds, and how to study them.
- “Creating Mass Collaboration” (2018): Categorizes three types of collaborative processes: human computation, open call, and distributed data collection. [link]
- Shaw and Hill (2014): A study on participation inequalities in online peer-production communities applying a political theory from 1911. [link]
- Muchnik, Aral, and Taylor (2013): A sort of social process audit on a news aggregation website, finding an asymmetric effect where users self-correct negative scores, but do not do the same on positive scores. [link]
Wrapping Up
We synthesize the main themes of the course and discuss the future of computational communication research.
- van Atteveldt and Peng (2018): An overview of the ways that computational techniques are changing communication research, with an emphasis on many of the themes and challenges discussed throughout this course. [link]
- Olteanu et al. (2019): An overview of many of the biases that can be introduced by social data. [link]