This is just a quick wrapper for a common, tedious task of collapsing several demographic groups, such as income brackets, into larger groups and taking a weighted mean based on a set of survey weights.
collapse_n_wt(
data,
...,
.lvls,
.group = group,
.value = value,
.weight = weight,
.fill_wts = FALSE,
.digits = NULL
)
A data frame, such as returned by xtab2df()
joined with
survey weights as returned by read_weights()
. The default
column names here match those returned by xtab2df
(group
, value
) and
read_weights
(weight
).
Bare column names to use for grouping, including the .group
column,
such as location, year, category, response, etc--probably everything except
values and weights.
A named list, where values are character vectors of smaller
groups (e.g. c("<$15K", "$15K-$30K")
) and names are the groups those will
be replaced by (e.g. "<$30K"
). This will be split into the arguments to
forcats::fct_collapse()
.
Bare column name of where groups should be found. Default: group
Bare column name of where values should be found. Default: value
Bare column name of where group weights should be found. Default: weight
Logical: if TRUE
, missing weights will be filled in with 1,
i.e. unweighted. This defaults to FALSE
, because missing weights is a
useful way to find that there's a mismatch between the group labels in
the data and those in the weights table, which is very often the case.
Therefore, only set this to TRUE
if you've already accounted for labeling
discrepancies.
Numeric: if given, weighted means will be rounded to this
number of digits. If NULL
(the default), values are returned unrounded.
A data frame with summarized values. The .group
column will have
the collapsed groups, and the .value
column will have average values.
# collapse income groups, such that <$15K, $15K-$30K become <$30K, etc
income_lvls <- list(
"<$30K" = c("<$15K", "$15K-$30K"),
"$30K-$100K" = c("$30K-$50K", "$50K-$75K", "$75K-$100K"),
"$100K+" = c("$100K-$200K", "$200K+")
)
cws_demo |>
dplyr::filter(category %in% c("Greater New Haven", "Income")) |>
collapse_n_wt(code:response, .lvls = income_lvls, .digits = 2)
#> # A tibble: 16 × 6
#> code question category group response value
#> <chr> <chr> <fct> <fct> <fct> <dbl>
#> 1 Q1 Are you satisfied with the city or area … Greater… Grea… Yes 0.82
#> 2 Q1 Are you satisfied with the city or area … Greater… Grea… No 0.17
#> 3 Q1 Are you satisfied with the city or area … Greater… Grea… Don't k… 0.01
#> 4 Q1 Are you satisfied with the city or area … Greater… Grea… Refused 0
#> 5 Q1 Are you satisfied with the city or area … Income <$30K Yes 0.77
#> 6 Q1 Are you satisfied with the city or area … Income <$30K No 0.22
#> 7 Q1 Are you satisfied with the city or area … Income <$30K Don't k… 0.01
#> 8 Q1 Are you satisfied with the city or area … Income <$30K Refused 0
#> 9 Q1 Are you satisfied with the city or area … Income $30K… Yes 0.83
#> 10 Q1 Are you satisfied with the city or area … Income $30K… No 0.15
#> 11 Q1 Are you satisfied with the city or area … Income $30K… Don't k… 0.01
#> 12 Q1 Are you satisfied with the city or area … Income $30K… Refused 0
#> 13 Q1 Are you satisfied with the city or area … Income $100… Yes 0.85
#> 14 Q1 Are you satisfied with the city or area … Income $100… No 0.14
#> 15 Q1 Are you satisfied with the city or area … Income $100… Don't k… 0.01
#> 16 Q1 Are you satisfied with the city or area … Income $100… Refused 0