Create labeled intervals based on standard deviations

Occasionally we make charts using standard deviations away from an average value to fill bars or geographies—not a hard task, but tedious. This function takes a data frame, then gets a midpoint value, either by calculating the mean or by filtering for an observation already in the data frame (such as a statewide value). It then calculates z-scores based on this midpoint and standard deviation, then cuts z-scores based on brks. Pay close attention to the argument by, which allows you to do these calculations grouped by some column; this is useful if you have a data frame of several different indicators. Alternatively, passing a grouped data frame will also do the calculations by group.

Usage

stdev_brks(
  x,
  value = value,
  filters = NULL,
  by = NULL,
  brks = c(-2, -1/2, 1/2, 2),
  labels = NULL,
  na.rm = TRUE,
  keep_calcs = TRUE,
  ...
)

Arguments

x: A data frame or tibble
value: Bare name of the numeric value column, Default: value
filters: An optional named list of values to use for filtering. If given, the observation matching these values will be used as the midpoint. If NULL (the default), the midpoint will be calculated as the mean of values, grouped by x's grouping columns (if any) and the arguments to by (also if any). Each item in that list should be a vector of length 1.
by: Optional character vector. If given, this will be used as the group within which intervals are calculated. Default: NULL
brks: Numeric vector of break points for cutting z-scores. This vector, plus -Inf and Inf, will be passed to base::cut's breaks argument. Default: c(-2, -1/2, 1/2, 2)
labels: Character vector of labels for the resulting factor. If NULL, levels will be in base::cut's interval notation. The length of this vector should be one more than the length of brks. Default: NULL
na.rm: Boolean passed on to mean if midpoints are being calculated. Default: TRUE
keep_calcs: Boolean, whether to keep columns from calculations. Default: TRUE
...: Additional arguments passed to base::cut

Value

A data frame or tibble with the same number of rows as x. If keep_calcs is true, the returned data frame will have numeric columns added for midpoint (midpt), standard deviation (sd), and z-score (z), and a factor column for the resulting intervals (brk). If false, the only column added will be the intervals.

Examples

# Calculate intervals along the full dataset, based on calculated mean
stdev_brks(median_age,
    labels = c("Lower", "Somewhat lower", "Average", "Somewhat higher", "Higher")
)
#> # A tibble: 141 × 9
#>    level  county      name                 sex   value midpt    sd       z brk  
#>    <fct>  <chr>       <chr>                <fct> <dbl> <dbl> <dbl>   <dbl> <fct>
#>  1 state  NA          Connecticut          total  41.2  42.4  5.14 -0.242  Aver…
#>  2 state  NA          Connecticut          male   39.8  42.4  5.14 -0.515  Some…
#>  3 state  NA          Connecticut          fema…  42.5  42.4  5.14  0.0103 Aver…
#>  4 county NA          Capitol COG          total  40.2  42.4  5.14 -0.437  Aver…
#>  5 county NA          Capitol COG          male   38.7  42.4  5.14 -0.728  Some…
#>  6 county NA          Capitol COG          fema…  41.7  42.4  5.14 -0.145  Aver…
#>  7 county NA          Greater Bridgeport … total  40    42.4  5.14 -0.476  Aver…
#>  8 county NA          Greater Bridgeport … male   39    42.4  5.14 -0.670  Some…
#>  9 county NA          Greater Bridgeport … fema…  41.2  42.4  5.14 -0.242  Aver…
#> 10 town   Capitol COG Andover              total  52.1  42.4  5.14  1.88   Some…
#> # ℹ 131 more rows

# That might be a little biased because there are also observations for state & counties.
# Calculate intervals for each of the three indicators in the `question` column.
# Both examples have the same result:
median_age |>
    stdev_brks(filters = list(level = "state"), by = "sex") # or list(name = "Connecticut")
#> # A tibble: 141 × 9
#>    level  county      name                  sex   value midpt    sd      z brk  
#>    <fct>  <chr>       <chr>                 <fct> <dbl> <dbl> <dbl>  <dbl> <fct>
#>  1 state  NA          Connecticut           total  41.2  41.2  4.96  0     (-0.…
#>  2 state  NA          Connecticut           male   39.8  39.8  4.97  0     (-0.…
#>  3 state  NA          Connecticut           fema…  42.5  42.5  5.44  0     (-0.…
#>  4 county NA          Capitol COG           total  40.2  41.2  4.96 -0.202 (-0.…
#>  5 county NA          Capitol COG           male   38.7  39.8  4.97 -0.221 (-0.…
#>  6 county NA          Capitol COG           fema…  41.7  42.5  5.44 -0.147 (-0.…
#>  7 county NA          Greater Bridgeport C… total  40    41.2  4.96 -0.242 (-0.…
#>  8 county NA          Greater Bridgeport C… male   39    39.8  4.97 -0.161 (-0.…
#>  9 county NA          Greater Bridgeport C… fema…  41.2  42.5  5.44 -0.239 (-0.…
#> 10 town   Capitol COG Andover               total  52.1  41.2  4.96  2.20  (2, …
#> # ℹ 131 more rows

median_age |>
    dplyr::group_by(sex) |>
    stdev_brks(filters = list(level = "state"))
#> # A tibble: 141 × 9
#> # Groups:   sex [3]
#>    level  county      name                  sex   value midpt    sd      z brk  
#>    <fct>  <chr>       <chr>                 <fct> <dbl> <dbl> <dbl>  <dbl> <fct>
#>  1 state  NA          Connecticut           total  41.2  41.2  4.96  0     (-0.…
#>  2 state  NA          Connecticut           male   39.8  39.8  4.97  0     (-0.…
#>  3 state  NA          Connecticut           fema…  42.5  42.5  5.44  0     (-0.…
#>  4 county NA          Capitol COG           total  40.2  41.2  4.96 -0.202 (-0.…
#>  5 county NA          Capitol COG           male   38.7  39.8  4.97 -0.221 (-0.…
#>  6 county NA          Capitol COG           fema…  41.7  42.5  5.44 -0.147 (-0.…
#>  7 county NA          Greater Bridgeport C… total  40    41.2  4.96 -0.242 (-0.…
#>  8 county NA          Greater Bridgeport C… male   39    39.8  4.97 -0.161 (-0.…
#>  9 county NA          Greater Bridgeport C… fema…  41.2  42.5  5.44 -0.239 (-0.…
#> 10 town   Capitol COG Andover               total  52.1  41.2  4.96  2.20  (2, …
#> # ℹ 131 more rows

Create labeled intervals based on standard deviations

Usage

Arguments

Value

See also

Examples