Skip to contents

Occasionally we make charts using standard deviations away from an average value to fill bars or geographies---not a hard task, but tedious. This function takes a data frame, then gets a midpoint value, either by calculating the mean or by filtering for an observation already in the data frame (such as a statewide value). It then calculates z-scores based on this midpoint and standard deviation, then cuts z-scores based on brks. Pay close attention to the argument by, which allows you to do these calculations grouped by some column; this is useful if you have a data frame of several different indicators. Alternatively, passing a grouped data frame will also do the calculations by group.

Usage

stdev_brks(
  x,
  value = value,
  filters = NULL,
  by = NULL,
  brks = c(-2, -1/2, 1/2, 2),
  labels = NULL,
  na.rm = TRUE,
  keep_calcs = TRUE,
  ...
)

Arguments

x

A data frame or tibble

value

Bare name of the numeric value column, Default: value

filters

An optional named list of values to use for filtering. If given, the observation matching these values will be used as the midpoint. If NULL (the default), the midpoint will be calculated as the mean of values, grouped by x's grouping columns (if any) and the arguments to by (also if any).

by

Optional character vector. If given, this will be used as the group within which intervals are calculated. Default: NULL

brks

Numeric vector of break points for cutting z-scores. This vector, plus -Inf and Inf, will be passed to base::cut's breaks argument. Default: c(-2, -1/2, 1/2, 2)

labels

Character vector of labels for the resulting factor. If NULL, levels will be in base::cut's interval notation. The length of this vector should be one more than the length of brks. Default: NULL

na.rm

Boolean passed on to mean if midpoints are being calculated. Default: TRUE

keep_calcs

Boolean, whether to keep columns from calculations. Default: TRUE

...

Additional arguments passed to base::cut

Value

A data frame or tibble with the same number of rows as x. If keep_calcs is true, the returned data frame will have numeric columns added for midpoint (midpt), standard deviation (sd), and z-score (z), and a factor column for the resulting intervals (brk). If false, the only column added will be the intervals.

See also

Examples

# Calculate intervals along the full dataset, based on calculated mean
stdev_brks(life_exp, 
           labels = c("Lower", "Somewhat lower", "Average", "Somewhat higher", "Higher"))
#> # A tibble: 189 × 7
#>    tract       town      value midpt    sd      z brk            
#>    <chr>       <chr>     <dbl> <dbl> <dbl>  <dbl> <fct>          
#>  1 09009142500 New Haven  75.6  79.3  3.11 -1.19  Somewhat lower 
#>  2 09009361402 New Haven  NA    79.3  3.11 NA     NA             
#>  3 09009140900 New Haven  77.9  79.3  3.11 -0.447 Average        
#>  4 09009142000 New Haven  82.5  79.3  3.11  1.03  Somewhat higher
#>  5 09009141800 New Haven  82.3  79.3  3.11  0.970 Somewhat higher
#>  6 09009140300 New Haven  76.9  79.3  3.11 -0.769 Somewhat lower 
#>  7 09009142700 New Haven  80.4  79.3  3.11  0.358 Average        
#>  8 09009140100 New Haven  80.3  79.3  3.11  0.326 Average        
#>  9 09009142100 New Haven  NA    79.3  3.11 NA     NA             
#> 10 09009141200 New Haven  77    79.3  3.11 -0.736 Somewhat lower 
#> # ℹ 179 more rows
           
# Calculate intervals for each of the three indicators in the `question` column. 
# Both examples have the same result:
fin_insecurity |>
  stdev_brks(filters = list(category = "Connecticut"), by = "question")
#> # A tibble: 69 × 8
#>    question        category          group       value midpt     sd      z brk  
#>    <chr>           <fct>             <fct>       <dbl> <dbl>  <dbl>  <dbl> <fct>
#>  1 food_insecurity Connecticut       Connecticut  0.13  0.13 0.0889  0     (-0.…
#>  2 food_insecurity Greater New Haven Greater Ne…  0.13  0.13 0.0889  0     (-0.…
#>  3 food_insecurity Gender            Men          0.12  0.13 0.0889 -0.112 (-0.…
#>  4 food_insecurity Gender            Women        0.14  0.13 0.0889  0.112 (-0.…
#>  5 food_insecurity Age               Ages 18-34   0.18  0.13 0.0889  0.562 (0.5…
#>  6 food_insecurity Age               Ages 35-49   0.2   0.13 0.0889  0.787 (0.5…
#>  7 food_insecurity Age               Ages 50-64   0.1   0.13 0.0889 -0.337 (-0.…
#>  8 food_insecurity Age               Ages 65+     0.05  0.13 0.0889 -0.899 (-2,…
#>  9 food_insecurity Race/Ethnicity    White        0.09  0.13 0.0889 -0.450 (-0.…
#> 10 food_insecurity Race/Ethnicity    Black        0.2   0.13 0.0889  0.787 (0.5…
#> # ℹ 59 more rows

fin_insecurity |>
  dplyr::group_by(question) |>
  stdev_brks(filters = list(category = "Connecticut"))
#> # A tibble: 69 × 8
#> # Groups:   question [3]
#>    question        category          group       value midpt     sd      z brk  
#>    <chr>           <fct>             <fct>       <dbl> <dbl>  <dbl>  <dbl> <fct>
#>  1 food_insecurity Connecticut       Connecticut  0.13  0.13 0.0889  0     (-0.…
#>  2 food_insecurity Greater New Haven Greater Ne…  0.13  0.13 0.0889  0     (-0.…
#>  3 food_insecurity Gender            Men          0.12  0.13 0.0889 -0.112 (-0.…
#>  4 food_insecurity Gender            Women        0.14  0.13 0.0889  0.112 (-0.…
#>  5 food_insecurity Age               Ages 18-34   0.18  0.13 0.0889  0.562 (0.5…
#>  6 food_insecurity Age               Ages 35-49   0.2   0.13 0.0889  0.787 (0.5…
#>  7 food_insecurity Age               Ages 50-64   0.1   0.13 0.0889 -0.337 (-0.…
#>  8 food_insecurity Age               Ages 65+     0.05  0.13 0.0889 -0.899 (-2,…
#>  9 food_insecurity Race/Ethnicity    White        0.09  0.13 0.0889 -0.450 (-0.…
#> 10 food_insecurity Race/Ethnicity    Black        0.2   0.13 0.0889  0.787 (0.5…
#> # ℹ 59 more rows