The ACS data is county-level, so we built models at the county-level. The outcome of interest is pooled protests to analyze a count of the number of protests at the county-level. Because we are using count data, we opted to run Poisson regression with a log link function. We are not using an offset term in this case, as we are modeling count data.
Upon running the Poisson regression model with a log link function, we found protest occurrence to increase with increasing income class. Using the “very low” income class as the referent, we found increased protest count, with the greatest occurring among the “very high” income class (RR: 2.13; 95% CI: 2.02, 2.24).
glm(n ~ income_class, data = protest_df, family = poisson(link = "log")) %>%
broom::tidy() %>%
mutate(
RR = exp(estimate),
CI_lower = exp(estimate - 1.96 * std.error),
CI_upper = exp(estimate + 1.96 * std.error)
) %>%
dplyr::select(term, RR, starts_with("CI")) %>%
mutate(
term = str_replace(term, "income_class", "Income Class: ")) %>%
kable(digits = 3) %>%
kable_styling(bootstrap_options = c("striped", "hover"))
term | RR | CI_lower | CI_upper |
---|---|---|---|
(Intercept) | 10.537 | 10.113 | 10.979 |
Income Class: High | 1.751 | 1.655 | 1.854 |
Income Class: Low | 1.410 | 1.324 | 1.502 |
Income Class: Middle | 1.288 | 1.209 | 1.372 |
Income Class: Very High | 2.126 | 2.019 | 2.239 |
Though crude analyses demonstrated that odds of greater protest counts increased with income class, we hypothesized this was a function of population size in urban cities and income class in larger cities. Therefore, we opted to explore whether county population size was impacting this relationship.
Based on the results below, and using the 10% change rule in the beta estimate as a requirement for confounding along with a priori hypothesis, we used county population size as a confounder.
glm(n ~ county_pop, data = protest_df, family = poisson(link = "log")) %>%
broom::tidy() %>%
mutate(
term = str_replace(term, "county_pop", "County Population")
) %>%
kable(digits = 3) %>%
kable_styling(bootstrap_options = c("striped", "hover"))
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 2.572 | 0.01 | 253.738 | 0 |
County Population | 0.000 | 0.00 | 106.838 | 0 |
After adjustment for county population size, the “very high” income class remained positively associated with protest count (aRR: 1.71; 95% CI: 1.62, 1.80) in comparison to the “very low” income class. The “low” (aRR: 1.30; 95% CI: 1.22, 1.38) and “middle” (aRR: 1.11; 95% CI: 1.05, 1.19) income classes were both positively associated with increased protest count; however, the “high” income class was no longer significantly associated with increased protest count.
Analyses demonstrate that counties classified as having a “very high” income are those counties with the greatest number of protests, in comparison to counties with a “very low” income.
glm(n ~ income_class + county_pop, data = protest_df, family = poisson(link = "log")) %>%
broom::tidy() %>%
mutate(
RR = exp(estimate),
CI_lower = exp(estimate - 1.96 * std.error),
CI_upper = exp(estimate + 1.96 * std.error)
) %>%
dplyr::select(term, RR, starts_with("CI")) %>%
mutate(
term = str_replace(term, "income_class", "Income Class: "),
term = str_replace(term, "county_pop", "County Population")) %>%
kable(digits = 3) %>%
kable_styling(bootstrap_options = c("striped", "hover"))
term | RR | CI_lower | CI_upper |
---|---|---|---|
(Intercept) | 10.432 | 10.004 | 10.879 |
Income Class: High | 1.030 | 0.966 | 1.099 |
Income Class: Low | 1.299 | 1.218 | 1.384 |
Income Class: Middle | 1.114 | 1.045 | 1.188 |
Income Class: Very High | 1.708 | 1.621 | 1.800 |
County Population | 1.000 | 1.000 | 1.000 |
We opted to investigate model fit through analyzing overdispersion. We found overdispersion to be a concern, given the below results suggesting that alpha = 15.05 with a p-value < 0.05. Therefore, the model built did not have appropriate fit and we concluded different model types should be explored.
glm(n ~ income_class + county_pop, data = protest_df, family = poisson(link = "log")) %>%
dispersiontest(trafo = 1) %>%
broom::tidy() %>%
dplyr::select(estimate, p.value) %>%
kable(digits = 3) %>%
kable_styling(bootstrap_options = c("striped", "hover"))
estimate | p.value |
---|---|
15.053 | 0 |
Given the results from the overdispersion test, we opted to fit a negative binomial model, which allows for overdispersion. We first built the crude model, where we found a similar association as observed in the Poisson model: the “very high” income class counties reported the greatest increase of protests (RR: 2.13; 95% CI: 1.72, 2.63).
glm.nb(n ~ income_class, data = protest_df) %>%
broom::tidy() %>%
mutate(
RR = exp(estimate),
CI_lower = exp(estimate - 1.96 * std.error),
CI_upper = exp(estimate + 1.96 * std.error)
) %>%
dplyr::select(term, RR, starts_with("CI")) %>%
mutate(
term = str_replace(term, "income_class", "Income Class: "),
term = str_replace(term, "county_pop", "County Population")) %>%
kable(digits = 3) %>%
kable_styling(bootstrap_options = c("striped", "hover"))
term | RR | CI_lower | CI_upper |
---|---|---|---|
(Intercept) | 10.537 | 9.124 | 12.169 |
Income Class: High | 1.751 | 1.391 | 2.205 |
Income Class: Low | 1.410 | 1.105 | 1.799 |
Income Class: Middle | 1.288 | 1.015 | 1.635 |
Income Class: Very High | 2.126 | 1.718 | 2.632 |
After adjustment for county population size, the “high” (aRR: 1.02; 95% CI: 0.84, 1.23), “middle” (aRR: 0.97; 95% CI: 0.80, 1.18), and “very high” (aRR: 1.18; 95% CI: 0.99, 1.40) counties were not significantly different from the “very low” counties in terms of increased protests. The “low” income class was significantly more likely to have protests in comparison to the “very low” income class (aRR: 1.27; 95% CI: 1.04, 1.54).
glm.nb(n ~ income_class + county_pop, data = protest_df) %>%
broom::tidy() %>%
mutate(
RR = exp(estimate),
CI_lower = exp(estimate - 1.96 * std.error),
CI_upper = exp(estimate + 1.96 * std.error)
) %>%
dplyr::select(term, RR, starts_with("CI")) %>%
mutate(
term = str_replace(term, "income_class", "Income Class: "),
term = str_replace(term, "county_pop", "County Population")) %>%
kable(digits = 3) %>%
kable_styling(bootstrap_options = c("striped", "hover"))
term | RR | CI_lower | CI_upper |
---|---|---|---|
(Intercept) | 6.755 | 5.989 | 7.619 |
Income Class: High | 1.015 | 0.841 | 1.226 |
Income Class: Low | 1.265 | 1.040 | 1.537 |
Income Class: Middle | 0.969 | 0.799 | 1.175 |
Income Class: Very High | 1.175 | 0.989 | 1.396 |
County Population | 1.000 | 1.000 | 1.000 |