Occasionally we make charts using standard deviations away from an average value to fill bars or geographies---not a hard task, but tedious. This function takes a data frame, then gets a midpoint value, either by calculating the mean or by filtering for an observation already in the data frame (such as a statewide value). It then calculates z-scores based on this midpoint and standard deviation, then cuts z-scores based on brks
. Pay close attention to the argument by
, which allows you to do these calculations grouped by some column; this is useful if you have a data frame of several different indicators. Alternatively, passing a grouped data frame will also do the calculations by group.
Usage
stdev_brks(
x,
value = value,
filters = NULL,
by = NULL,
brks = c(-2, -1/2, 1/2, 2),
labels = NULL,
na.rm = TRUE,
keep_calcs = TRUE,
...
)
Arguments
- x
A data frame or tibble
- value
Bare name of the numeric value column, Default: value
- filters
An optional named list of values to use for filtering. If given, the observation matching these values will be used as the midpoint. If NULL (the default), the midpoint will be calculated as the mean of values, grouped by
x
's grouping columns (if any) and the arguments toby
(also if any).- by
Optional character vector. If given, this will be used as the group within which intervals are calculated. Default: NULL
- brks
Numeric vector of break points for cutting z-scores. This vector, plus
-Inf
andInf
, will be passed tobase::cut
'sbreaks
argument. Default: c(-2, -1/2, 1/2, 2)- labels
Character vector of labels for the resulting factor. If NULL, levels will be in
base::cut
's interval notation. The length of this vector should be one more than the length ofbrks
. Default: NULL- na.rm
Boolean passed on to
mean
if midpoints are being calculated. Default: TRUE- keep_calcs
Boolean, whether to keep columns from calculations. Default: TRUE
- ...
Additional arguments passed to
base::cut
Value
A data frame or tibble with the same number of rows as x
. If keep_calcs
is true, the returned data frame will have numeric columns added for midpoint (midpt
), standard deviation (sd
), and z-score (z
), and a factor column for the resulting intervals (brk
). If false, the only column added will be the intervals.
Examples
# Calculate intervals along the full dataset, based on calculated mean
stdev_brks(life_exp,
labels = c("Lower", "Somewhat lower", "Average", "Somewhat higher", "Higher"))
#> # A tibble: 189 × 7
#> tract town value midpt sd z brk
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 09009142500 New Haven 75.6 79.3 3.11 -1.19 Somewhat lower
#> 2 09009361402 New Haven NA 79.3 3.11 NA NA
#> 3 09009140900 New Haven 77.9 79.3 3.11 -0.447 Average
#> 4 09009142000 New Haven 82.5 79.3 3.11 1.03 Somewhat higher
#> 5 09009141800 New Haven 82.3 79.3 3.11 0.970 Somewhat higher
#> 6 09009140300 New Haven 76.9 79.3 3.11 -0.769 Somewhat lower
#> 7 09009142700 New Haven 80.4 79.3 3.11 0.358 Average
#> 8 09009140100 New Haven 80.3 79.3 3.11 0.326 Average
#> 9 09009142100 New Haven NA 79.3 3.11 NA NA
#> 10 09009141200 New Haven 77 79.3 3.11 -0.736 Somewhat lower
#> # ℹ 179 more rows
# Calculate intervals for each of the three indicators in the `question` column.
# Both examples have the same result:
fin_insecurity |>
stdev_brks(filters = list(category = "Connecticut"), by = "question")
#> # A tibble: 69 × 8
#> question category group value midpt sd z brk
#> <chr> <fct> <fct> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 food_insecurity Connecticut Connecticut 0.13 0.13 0.0889 0 (-0.…
#> 2 food_insecurity Greater New Haven Greater Ne… 0.13 0.13 0.0889 0 (-0.…
#> 3 food_insecurity Gender Men 0.12 0.13 0.0889 -0.112 (-0.…
#> 4 food_insecurity Gender Women 0.14 0.13 0.0889 0.112 (-0.…
#> 5 food_insecurity Age Ages 18-34 0.18 0.13 0.0889 0.562 (0.5…
#> 6 food_insecurity Age Ages 35-49 0.2 0.13 0.0889 0.787 (0.5…
#> 7 food_insecurity Age Ages 50-64 0.1 0.13 0.0889 -0.337 (-0.…
#> 8 food_insecurity Age Ages 65+ 0.05 0.13 0.0889 -0.899 (-2,…
#> 9 food_insecurity Race/Ethnicity White 0.09 0.13 0.0889 -0.450 (-0.…
#> 10 food_insecurity Race/Ethnicity Black 0.2 0.13 0.0889 0.787 (0.5…
#> # ℹ 59 more rows
fin_insecurity |>
dplyr::group_by(question) |>
stdev_brks(filters = list(category = "Connecticut"))
#> # A tibble: 69 × 8
#> # Groups: question [3]
#> question category group value midpt sd z brk
#> <chr> <fct> <fct> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 food_insecurity Connecticut Connecticut 0.13 0.13 0.0889 0 (-0.…
#> 2 food_insecurity Greater New Haven Greater Ne… 0.13 0.13 0.0889 0 (-0.…
#> 3 food_insecurity Gender Men 0.12 0.13 0.0889 -0.112 (-0.…
#> 4 food_insecurity Gender Women 0.14 0.13 0.0889 0.112 (-0.…
#> 5 food_insecurity Age Ages 18-34 0.18 0.13 0.0889 0.562 (0.5…
#> 6 food_insecurity Age Ages 35-49 0.2 0.13 0.0889 0.787 (0.5…
#> 7 food_insecurity Age Ages 50-64 0.1 0.13 0.0889 -0.337 (-0.…
#> 8 food_insecurity Age Ages 65+ 0.05 0.13 0.0889 -0.899 (-2,…
#> 9 food_insecurity Race/Ethnicity White 0.09 0.13 0.0889 -0.450 (-0.…
#> 10 food_insecurity Race/Ethnicity Black 0.2 0.13 0.0889 0.787 (0.5…
#> # ℹ 59 more rows