US Foreign Policy

Research Methods Primer

Michael Flynn

Professor

Department of Political Science

011C Calvin Hall

meflynn@ksu.edu

2024-09-17

Overview

What do we want to get out of today’s class?

  1. Learn how to read a research paper
  2. An understanding of some of the general methods we use in research
  3. Learn the basics of how to interpret statistical models
  4. Some of the different types of bias and ways that data can fool us
  5. Learn the importance of theory

Reading Research

The Anatomy of a Research Paper

How you’ll want to read it:

  1. Abstract
  2. Introduction
  3. Theory
  4. Research Design
  5. Results
  6. Conclusion

The Anatomy of a Research Paper

How you’ll want to read it:

  1. Abstract
  2. Introduction
  3. Theory
  4. Research Design
  5. Results
  6. Conclusion

How you should read it (as a rough guide):

  1. Abstract
  2. Introduction
  3. Conclusion
  4. Results
  5. Research Design
  6. Theory

Methods

Common Methods

We use lots of different methods in the social sciences, but let’s look at a few key approaches/tools:

  • Statistical modeling and regression analysis
  • Formal (mathematical) modeling
  • Experimental methods
  • Case studies
  • Field work
  • Surveys
  • Archival research

Interpretation

Interpreting and Assessing Statistical Models

First, what are our goals?

  • Prediction?
  • Description?
  • Estimating causal effects?

Interpreting and Assessing Statistical Models

Prediction

  • Overall model fit

  • In-sample fit?

  • Out-of-sample fit?

Description

  • Overall model fit

  • Individual coefficients

  • Associations between groups

Causal Inference

  • Focus on individual coefficients for causal effects

Interpretting and Assessing Statistical Models

What is a variable?

  • A variable is a characteristic, trait, attribute, or condition that can be measured or observed

  • A “definable quantity that can take on two or more values” (Kellstedt and Whitten 2018, 22).

  • Examples:

    • Age
    • Income
    • Education
    • Speed of a moving vehicle
    • Color a car is painted
    • Whether a person is a Democrat or Republican
    • If a member of Congress served in the military or not

Interpreting and Assessing Statistical Models

What is a coefficient?

\[\begin{equation} Y_i = \alpha + {\color{blue} \beta_1} X_i + {\color{blue}\beta_2} Z_i + \epsilon_i \end{equation}\]

  • The correlation between \(X\) and \(Y\), conditional on other predictors and model

  • Typically denotes the magnitude of the change in \(Y\) that corresponds to a one-unit change in \(X\)

  • In some cases this is direct and easy, less so in other cases

  • Different models and measurement schemes mean we need to be cautious in interpreting coefficients

Interpreting and Assessing Statistical Models

Example of a regression table from a research article

Sources of Bias

What’s “Bias”?

Bias: Systematic error of some sort that can affect our inferences and conclusions (Howard 2017, 151)

  • What it’s not:

    • Presenting “one side”
    • “Bias” in the colloquial cable news meaning of “Anything I don’t like” or “Unflattering coverage of people and things I do like.”

Confounding

What is confounding?

  • Confounding is when the predictor (i.e. X) and the outcome (i.e. Y) share a common ancestor (i.e. Z).

  • If we don’t take into account the relationship between Z and the other variables, our estimates of X are biased.

  • The most common way to do this is to include Z into our model

\[\begin{align} Y & = \alpha + \beta_1 X_i + \epsilon_i \\ vs \\ Y & = \alpha + \beta_1 X_i + \beta_2 Z_i + \epsilon_i \end{align}\]

Confounding Example

Let’s imagine we’re interested in the relationship between gas pedal pressure and speed, so we collect data on

  • A few hundred drivers
  • Their speed in mph
  • How hard they press the pedal

Call:
lm(formula = speed ~ pedal_pressure, data = cardata)

Coefficients:
   (Intercept)  pedal_pressure  
    70.0441843      -0.0004776  

Confounding Example

Confounding can mask causal relationships

  • We need to understand the data generating process to help us make sense of results that may or may not themselves make sense

  • Adjusting for confounders in our model can reduce bias and help us better estimate causal relationships


Call:
lm(formula = speed ~ pedal_pressure + incline, data = cardata)

Coefficients:
   (Intercept)  pedal_pressure         incline  
        45.110           1.244          -1.492  

Base Rate Fallacy

What is it and why does it matter?

  • Counts can be deceptive without information about population of interest
  • Example of vaccine status and hospitalization
  • Vaccines are effective at reducing risk of severe illness, hospitalization, etc.
  • And yet in some locations we see more hospital beds occupied by vaccinated people than non-vaccinated people
  • Huh? What gives?

Base Rate Fallacy

The underlying population matters!

  • The rate of an event may vary across groups

  • Knowing something about the reference population is key

  • Another way to think of this is the symmetry/asymmetry of conditional probabilities

\[Pr(H \mid V) \neq Pr(V \mid H)\]

Ecological Fallacy

What is it and why does it matter?

  • The level at which our data are aggregated matters
  • Drawing inferences across levels of aggregation can be tricky
  • For example, income and voting behavior in US elections
    • Wealthier states tend to break for Democratic candidates
    • Wealthier individuals tend to break for Republican candidates

Simpson’s Paradox

What is it and why does it matter?

  • Simpson’s Paradox is a special case of confounding

  • Simple bivariate relationship between X and Y may appear to show one thing

  • But when we adjust for other relevant variables (i.e. Z) the apparent relationship between X and Y reverses

  • Common in lots of fields of study

Simpson’s Paradox

Example:

  • Imagine we have data on the exercise activity and cholesterol of a few hundred adults

  • We plot the individuals’ cholesterol levels against their level of exercise (we’ll assume we directly observe this and it’s not self-reported)

  • In the plot we see that exercise appears to correlate positively with cholesterol levels. A simple regression model supports this eyeball estimate.


Call:
lm(formula = y ~ x, data = data2)

Coefficients:
(Intercept)            x  
    182.372        2.078  

Simpson’s Paradox

Example:

  • But what if we’re omitting some key variables?

  • It’s possible for results to reverse if we’re omitting a variable that could also affect cholesterol levels

  • Another way to say this is that the effect of exercise might be biased because we’re leaving other important things out of the model.

  • One good variable would be patient age, so let’s see what that looks like


Call:
lm(formula = y ~ x + age, data = data2)

Coefficients:
(Intercept)            x     ageAge 2     ageAge 3     ageAge 4     ageAge 5  
   193.1930      -0.6774       4.2166       8.1891      11.8306      15.1400  

Collider Bias

What is collider bias?

  • Collider bias can be introduced by adjusting for a common effect.

  • It can also be a form of selection bias.

  • Similar to Simpson’s Paradox it can make effects appear to be the opposite of what they are, but also depends on the population of interest.

  • In this example Z is a collider because both X and Y cause Z.

Collider Bias

Example: College Admissions

  • Let’s assume we’re interested in looking at students who have been admitted to college to see what the relationship is between their high school extracurricular activities and GPA

  • We collect data on college freshman and compare their extracurricular activities in high school with their GPA.

  • Surprisingly, we find a negative relationship.

  • That’s sad.

Collider Bias

Example: College Admissions

  • But wait!

  • If we’re looking at students who have already been admitted to college we’re implicitly conditioning on a collider variable (i.e. acceptance/admission status)

  • If we step back and look at the entire population of high school students we find something different

  • Students who have better grades and/or better extracurricular activities are more likely to be admitted to college in the first place.

Role of Theory

The Role of Theory

What is theory?

Let’s start with what it’s not

  • Not a hypothesis
  • Not a hunch
  • Not a guess
  • Just an idea with little or no supporting evidence

The Role of Theory

What is theory?

  • More accurately, theory is the story we tell about which variables are important and how those variables are related.

  • It both describes and explains how variables affect one another.

  • It also allows us to make predictions about what we should expect to see in various scenarios

Theory: A tentative conjecture about the causes of some phenomenon of interest (Kellstedt and Whitten 2019, 22)

Theory: A logically consistent set of statements that explains a phenomenon of interest (Frieden, Lake, and Schultz 2022, xxix)