Occasionally we make charts using standard deviations away from an average value to fill bars or geographies—not a hard task, but tedious. This function takes a data frame, then gets a midpoint value, either by calculating the mean or by filtering for an observation already in the data frame (such as a statewide value). It then calculates z-scores based on this midpoint and standard deviation, then cuts z-scores based on brks
. Pay close attention to the argument by
, which allows you to do these calculations grouped by some column; this is useful if you have a data frame of several different indicators. Alternatively, passing a grouped data frame will also do the calculations by group.
Usage
stdev_brks(
x,
value = value,
filters = NULL,
by = NULL,
brks = c(-2, -1/2, 1/2, 2),
labels = NULL,
na.rm = TRUE,
keep_calcs = TRUE,
...
)
Arguments
- x
A data frame or tibble
- value
Bare name of the numeric value column, Default: value
- filters
An optional named list of values to use for filtering. If given, the observation matching these values will be used as the midpoint. If NULL (the default), the midpoint will be calculated as the mean of values, grouped by
x
's grouping columns (if any) and the arguments toby
(also if any). Each item in that list should be a vector of length 1.- by
Optional character vector. If given, this will be used as the group within which intervals are calculated. Default: NULL
- brks
Numeric vector of break points for cutting z-scores. This vector, plus
-Inf
andInf
, will be passed tobase::cut
'sbreaks
argument. Default:c(-2, -1/2, 1/2, 2)
- labels
Character vector of labels for the resulting factor. If NULL, levels will be in
base::cut
's interval notation. The length of this vector should be one more than the length ofbrks
. Default: NULL- na.rm
Boolean passed on to
mean
if midpoints are being calculated. Default: TRUE- keep_calcs
Boolean, whether to keep columns from calculations. Default: TRUE
- ...
Additional arguments passed to
base::cut
Value
A data frame or tibble with the same number of rows as x
. If keep_calcs
is true, the returned data frame will have numeric columns added for midpoint (midpt
), standard deviation (sd
), and z-score (z
), and a factor column for the resulting intervals (brk
). If false, the only column added will be the intervals.
Examples
# Calculate intervals along the full dataset, based on calculated mean
stdev_brks(median_age,
labels = c("Lower", "Somewhat lower", "Average", "Somewhat higher", "Higher")
)
#> # A tibble: 141 × 9
#> level county name sex value midpt sd z brk
#> <fct> <chr> <chr> <fct> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 state NA Connecticut total 41.2 42.4 5.14 -0.242 Aver…
#> 2 state NA Connecticut male 39.8 42.4 5.14 -0.515 Some…
#> 3 state NA Connecticut fema… 42.5 42.4 5.14 0.0103 Aver…
#> 4 county NA Capitol COG total 40.2 42.4 5.14 -0.437 Aver…
#> 5 county NA Capitol COG male 38.7 42.4 5.14 -0.728 Some…
#> 6 county NA Capitol COG fema… 41.7 42.4 5.14 -0.145 Aver…
#> 7 county NA Greater Bridgeport … total 40 42.4 5.14 -0.476 Aver…
#> 8 county NA Greater Bridgeport … male 39 42.4 5.14 -0.670 Some…
#> 9 county NA Greater Bridgeport … fema… 41.2 42.4 5.14 -0.242 Aver…
#> 10 town Capitol COG Andover total 52.1 42.4 5.14 1.88 Some…
#> # ℹ 131 more rows
# That might be a little biased because there are also observations for state & counties.
# Calculate intervals for each of the three indicators in the `question` column.
# Both examples have the same result:
median_age |>
stdev_brks(filters = list(level = "state"), by = "sex") # or list(name = "Connecticut")
#> # A tibble: 141 × 9
#> level county name sex value midpt sd z brk
#> <fct> <chr> <chr> <fct> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 state NA Connecticut total 41.2 41.2 4.96 0 (-0.…
#> 2 state NA Connecticut male 39.8 39.8 4.97 0 (-0.…
#> 3 state NA Connecticut fema… 42.5 42.5 5.44 0 (-0.…
#> 4 county NA Capitol COG total 40.2 41.2 4.96 -0.202 (-0.…
#> 5 county NA Capitol COG male 38.7 39.8 4.97 -0.221 (-0.…
#> 6 county NA Capitol COG fema… 41.7 42.5 5.44 -0.147 (-0.…
#> 7 county NA Greater Bridgeport C… total 40 41.2 4.96 -0.242 (-0.…
#> 8 county NA Greater Bridgeport C… male 39 39.8 4.97 -0.161 (-0.…
#> 9 county NA Greater Bridgeport C… fema… 41.2 42.5 5.44 -0.239 (-0.…
#> 10 town Capitol COG Andover total 52.1 41.2 4.96 2.20 (2, …
#> # ℹ 131 more rows
median_age |>
dplyr::group_by(sex) |>
stdev_brks(filters = list(level = "state"))
#> # A tibble: 141 × 9
#> # Groups: sex [3]
#> level county name sex value midpt sd z brk
#> <fct> <chr> <chr> <fct> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 state NA Connecticut total 41.2 41.2 4.96 0 (-0.…
#> 2 state NA Connecticut male 39.8 39.8 4.97 0 (-0.…
#> 3 state NA Connecticut fema… 42.5 42.5 5.44 0 (-0.…
#> 4 county NA Capitol COG total 40.2 41.2 4.96 -0.202 (-0.…
#> 5 county NA Capitol COG male 38.7 39.8 4.97 -0.221 (-0.…
#> 6 county NA Capitol COG fema… 41.7 42.5 5.44 -0.147 (-0.…
#> 7 county NA Greater Bridgeport C… total 40 41.2 4.96 -0.242 (-0.…
#> 8 county NA Greater Bridgeport C… male 39 39.8 4.97 -0.161 (-0.…
#> 9 county NA Greater Bridgeport C… fema… 41.2 42.5 5.44 -0.239 (-0.…
#> 10 town Capitol COG Andover total 52.1 41.2 4.96 2.20 (2, …
#> # ℹ 131 more rows