Regression Analyses of Protest Data

Outcome: Protest Count

The ACS data is county-level, so we built models at the county-level. The outcome of interest is pooled protests to analyze a count of the number of protests at the county-level. Because we are using count data, we opted to run Poisson regression with a log link function. We are not using an offset term in this case, as we are modeling count data.

Crude Model

Upon running the Poisson regression model with a log link function, we found protest occurrence to increase with increasing income class. Using the “very low” income class as the referent, we found increased protest count, with the greatest occurring among the “very high” income class (RR: 2.13; 95% CI: 2.02, 2.24).

glm(n ~ income_class, data = protest_df, family = poisson(link = "log")) %>% 
  broom::tidy() %>% 
  mutate(
    RR = exp(estimate),
    CI_lower = exp(estimate - 1.96 * std.error),
    CI_upper = exp(estimate + 1.96 * std.error)
  ) %>% 
  dplyr::select(term, RR, starts_with("CI")) %>% 
  mutate(
    term = str_replace(term, "income_class", "Income Class: ")) %>% 
  kable(digits = 3) %>% 
  kable_styling(bootstrap_options = c("striped", "hover"))

term	RR	CI_lower	CI_upper
(Intercept)	10.537	10.113	10.979
Income Class: High	1.751	1.655	1.854
Income Class: Low	1.410	1.324	1.502
Income Class: Middle	1.288	1.209	1.372
Income Class: Very High	2.126	2.019	2.239

Confounder Selection

Though crude analyses demonstrated that odds of greater protest counts increased with income class, we hypothesized this was a function of population size in urban cities and income class in larger cities. Therefore, we opted to explore whether county population size was impacting this relationship.

Based on the results below, and using the 10% change rule in the beta estimate as a requirement for confounding along with a priori hypothesis, we used county population size as a confounder.

glm(n ~ county_pop, data = protest_df, family = poisson(link = "log")) %>% 
  broom::tidy() %>% 
  mutate(
    term = str_replace(term, "county_pop", "County Population")
  ) %>% 
  kable(digits = 3) %>% 
  kable_styling(bootstrap_options = c("striped", "hover"))

term	estimate	std.error	statistic	p.value
(Intercept)	2.572	0.01	253.738	0
County Population	0.000	0.00	106.838	0

Adjusted Analysis

After adjustment for county population size, the “very high” income class remained positively associated with protest count (aRR: 1.71; 95% CI: 1.62, 1.80) in comparison to the “very low” income class. The “low” (aRR: 1.30; 95% CI: 1.22, 1.38) and “middle” (aRR: 1.11; 95% CI: 1.05, 1.19) income classes were both positively associated with increased protest count; however, the “high” income class was no longer significantly associated with increased protest count.

Analyses demonstrate that counties classified as having a “very high” income are those counties with the greatest number of protests, in comparison to counties with a “very low” income.

glm(n ~ income_class + county_pop, data = protest_df, family = poisson(link = "log")) %>% 
  broom::tidy() %>% 
  mutate(
    RR = exp(estimate),
    CI_lower = exp(estimate - 1.96 * std.error),
    CI_upper = exp(estimate + 1.96 * std.error)
  ) %>% 
  dplyr::select(term, RR, starts_with("CI")) %>% 
  mutate(
    term = str_replace(term, "income_class", "Income Class: "),
    term = str_replace(term, "county_pop", "County Population")) %>% 
  kable(digits = 3) %>% 
  kable_styling(bootstrap_options = c("striped", "hover"))

term	RR	CI_lower	CI_upper
(Intercept)	10.432	10.004	10.879
Income Class: High	1.030	0.966	1.099
Income Class: Low	1.299	1.218	1.384
Income Class: Middle	1.114	1.045	1.188
Income Class: Very High	1.708	1.621	1.800
County Population	1.000	1.000	1.000

Model Fit

We opted to investigate model fit through analyzing overdispersion. We found overdispersion to be a concern, given the below results suggesting that alpha = 15.05 with a p-value < 0.05. Therefore, the model built did not have appropriate fit and we concluded different model types should be explored.

glm(n ~ income_class + county_pop, data = protest_df, family = poisson(link = "log")) %>%
  dispersiontest(trafo = 1) %>% 
  broom::tidy() %>% 
  dplyr::select(estimate, p.value) %>% 
  kable(digits = 3) %>% 
  kable_styling(bootstrap_options = c("striped", "hover"))

estimate	p.value
15.053	0

Negative Binomial Model

Given the results from the overdispersion test, we opted to fit a negative binomial model, which allows for overdispersion. We first built the crude model, where we found a similar association as observed in the Poisson model: the “very high” income class counties reported the greatest increase of protests (RR: 2.13; 95% CI: 1.72, 2.63).

glm.nb(n ~ income_class, data = protest_df) %>% 
  broom::tidy() %>% 
  mutate(
    RR = exp(estimate),
    CI_lower = exp(estimate - 1.96 * std.error),
    CI_upper = exp(estimate + 1.96 * std.error)
  ) %>% 
  dplyr::select(term, RR, starts_with("CI")) %>% 
  mutate(
    term = str_replace(term, "income_class", "Income Class: "),
    term = str_replace(term, "county_pop", "County Population")) %>% 
  kable(digits = 3) %>% 
  kable_styling(bootstrap_options = c("striped", "hover"))

term	RR	CI_lower	CI_upper
(Intercept)	10.537	9.124	12.169
Income Class: High	1.751	1.391	2.205
Income Class: Low	1.410	1.105	1.799
Income Class: Middle	1.288	1.015	1.635
Income Class: Very High	2.126	1.718	2.632

After adjustment for county population size, the “high” (aRR: 1.02; 95% CI: 0.84, 1.23), “middle” (aRR: 0.97; 95% CI: 0.80, 1.18), and “very high” (aRR: 1.18; 95% CI: 0.99, 1.40) counties were not significantly different from the “very low” counties in terms of increased protests. The “low” income class was significantly more likely to have protests in comparison to the “very low” income class (aRR: 1.27; 95% CI: 1.04, 1.54).

glm.nb(n ~ income_class + county_pop, data = protest_df) %>% 
  broom::tidy() %>% 
  mutate(
    RR = exp(estimate),
    CI_lower = exp(estimate - 1.96 * std.error),
    CI_upper = exp(estimate + 1.96 * std.error)
  ) %>% 
  dplyr::select(term, RR, starts_with("CI")) %>% 
  mutate(
    term = str_replace(term, "income_class", "Income Class: "),
    term = str_replace(term, "county_pop", "County Population")) %>% 
  kable(digits = 3) %>% 
  kable_styling(bootstrap_options = c("striped", "hover"))

term	RR	CI_lower	CI_upper
(Intercept)	6.755	5.989	7.619
Income Class: High	1.015	0.841	1.226
Income Class: Low	1.265	1.040	1.537
Income Class: Middle	0.969	0.799	1.175
Income Class: Very High	1.175	0.989	1.396
County Population	1.000	1.000	1.000