Skip to contents

Occasionally we make charts using standard deviations away from an average value to fill bars or geographies—not a hard task, but tedious. This function takes a data frame, then gets a midpoint value, either by calculating the mean or by filtering for an observation already in the data frame (such as a statewide value). It then calculates z-scores based on this midpoint and standard deviation, then cuts z-scores based on brks. Pay close attention to the argument by, which allows you to do these calculations grouped by some column; this is useful if you have a data frame of several different indicators. Alternatively, passing a grouped data frame will also do the calculations by group.

Usage

stdev_brks(
  x,
  value = value,
  filters = NULL,
  by = NULL,
  brks = c(-2, -1/2, 1/2, 2),
  labels = NULL,
  na.rm = TRUE,
  keep_calcs = TRUE,
  ...
)

Arguments

x

A data frame or tibble

value

Bare name of the numeric value column, Default: value

filters

An optional named list of values to use for filtering. If given, the observation matching these values will be used as the midpoint. If NULL (the default), the midpoint will be calculated as the mean of values, grouped by x's grouping columns (if any) and the arguments to by (also if any). Each item in that list should be a vector of length 1.

by

Optional character vector. If given, this will be used as the group within which intervals are calculated. Default: NULL

brks

Numeric vector of break points for cutting z-scores. This vector, plus -Inf and Inf, will be passed to base::cut's breaks argument. Default: c(-2, -1/2, 1/2, 2)

labels

Character vector of labels for the resulting factor. If NULL, levels will be in base::cut's interval notation. The length of this vector should be one more than the length of brks. Default: NULL

na.rm

Boolean passed on to mean if midpoints are being calculated. Default: TRUE

keep_calcs

Boolean, whether to keep columns from calculations. Default: TRUE

...

Additional arguments passed to base::cut

Value

A data frame or tibble with the same number of rows as x. If keep_calcs is true, the returned data frame will have numeric columns added for midpoint (midpt), standard deviation (sd), and z-score (z), and a factor column for the resulting intervals (brk). If false, the only column added will be the intervals.

See also

Examples

# Calculate intervals along the full dataset, based on calculated mean
stdev_brks(median_age,
    labels = c("Lower", "Somewhat lower", "Average", "Somewhat higher", "Higher")
)
#> # A tibble: 141 × 9
#>    level  county      name                 sex   value midpt    sd       z brk  
#>    <fct>  <chr>       <chr>                <fct> <dbl> <dbl> <dbl>   <dbl> <fct>
#>  1 state  NA          Connecticut          total  41.2  42.4  5.14 -0.242  Aver…
#>  2 state  NA          Connecticut          male   39.8  42.4  5.14 -0.515  Some…
#>  3 state  NA          Connecticut          fema…  42.5  42.4  5.14  0.0103 Aver…
#>  4 county NA          Capitol COG          total  40.2  42.4  5.14 -0.437  Aver…
#>  5 county NA          Capitol COG          male   38.7  42.4  5.14 -0.728  Some…
#>  6 county NA          Capitol COG          fema…  41.7  42.4  5.14 -0.145  Aver…
#>  7 county NA          Greater Bridgeport … total  40    42.4  5.14 -0.476  Aver…
#>  8 county NA          Greater Bridgeport … male   39    42.4  5.14 -0.670  Some…
#>  9 county NA          Greater Bridgeport … fema…  41.2  42.4  5.14 -0.242  Aver…
#> 10 town   Capitol COG Andover              total  52.1  42.4  5.14  1.88   Some…
#> # ℹ 131 more rows

# That might be a little biased because there are also observations for state & counties.
# Calculate intervals for each of the three indicators in the `question` column.
# Both examples have the same result:
median_age |>
    stdev_brks(filters = list(level = "state"), by = "sex") # or list(name = "Connecticut")
#> # A tibble: 141 × 9
#>    level  county      name                  sex   value midpt    sd      z brk  
#>    <fct>  <chr>       <chr>                 <fct> <dbl> <dbl> <dbl>  <dbl> <fct>
#>  1 state  NA          Connecticut           total  41.2  41.2  4.96  0     (-0.…
#>  2 state  NA          Connecticut           male   39.8  39.8  4.97  0     (-0.…
#>  3 state  NA          Connecticut           fema…  42.5  42.5  5.44  0     (-0.…
#>  4 county NA          Capitol COG           total  40.2  41.2  4.96 -0.202 (-0.…
#>  5 county NA          Capitol COG           male   38.7  39.8  4.97 -0.221 (-0.…
#>  6 county NA          Capitol COG           fema…  41.7  42.5  5.44 -0.147 (-0.…
#>  7 county NA          Greater Bridgeport C… total  40    41.2  4.96 -0.242 (-0.…
#>  8 county NA          Greater Bridgeport C… male   39    39.8  4.97 -0.161 (-0.…
#>  9 county NA          Greater Bridgeport C… fema…  41.2  42.5  5.44 -0.239 (-0.…
#> 10 town   Capitol COG Andover               total  52.1  41.2  4.96  2.20  (2, …
#> # ℹ 131 more rows

median_age |>
    dplyr::group_by(sex) |>
    stdev_brks(filters = list(level = "state"))
#> # A tibble: 141 × 9
#> # Groups:   sex [3]
#>    level  county      name                  sex   value midpt    sd      z brk  
#>    <fct>  <chr>       <chr>                 <fct> <dbl> <dbl> <dbl>  <dbl> <fct>
#>  1 state  NA          Connecticut           total  41.2  41.2  4.96  0     (-0.…
#>  2 state  NA          Connecticut           male   39.8  39.8  4.97  0     (-0.…
#>  3 state  NA          Connecticut           fema…  42.5  42.5  5.44  0     (-0.…
#>  4 county NA          Capitol COG           total  40.2  41.2  4.96 -0.202 (-0.…
#>  5 county NA          Capitol COG           male   38.7  39.8  4.97 -0.221 (-0.…
#>  6 county NA          Capitol COG           fema…  41.7  42.5  5.44 -0.147 (-0.…
#>  7 county NA          Greater Bridgeport C… total  40    41.2  4.96 -0.242 (-0.…
#>  8 county NA          Greater Bridgeport C… male   39    39.8  4.97 -0.161 (-0.…
#>  9 county NA          Greater Bridgeport C… fema…  41.2  42.5  5.44 -0.239 (-0.…
#> 10 town   Capitol COG Andover               total  52.1  41.2  4.96  2.20  (2, …
#> # ℹ 131 more rows