This is a list of 2 data frames giving PUMAs that make reasonable approximations of designated regions, with weights to apply to both population- and household-based measures. The data frame labeled county
uses county-based PUMAs and 2021 ACS values; the data frame cog
uses the new COG-based PUMAs and 2022 ACS values. When working with PUMS data or other weighted surveys, multiply the weights in the proxy table with the weights from the survey to account for how much of the PUMA overlaps the region.
Format
A list of 2 data frames, county
and cog
, with 19 and 54 rows, respectively, and 6 variables:
- puma
7-digit PUMA FIPS code
- region
Region name
- pop
Total population in the overlapping area between the region and the PUMA
- hh
Total households in the overlapping area between the region and the PUMA
- pop_weight
Population weight: share of the PUMA's population that's included in the region, to be used for population-based survey analysis
- hh_weight
Household weight: share of the PUMA's households that are included in the region, to be used for household-based survey analysis
Details
The county-based table includes just non-county regions (e.g. Greater New Haven), but the COG-based table also includes "legacy" counties (e.g. New Haven County), since we assume that even if data isn't released for counties, some organizations might still want estimates based on those geographies. See maps of proxies and their weights here: https://ct-data-haven.github.io/cogs/proxy-geos.html
NOTE: There are some PUMAs that are included in more than one region. When joining these tables with survey data, make sure you're allowing for duplicates of PUMAs.
Examples
# proxies made from county-based PUMAs, use for pre-2022 ACS or other datasets
proxy_pumas$county
#> # A tibble: 19 × 6
#> puma region pop hh pop_weight hh_weight
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 0900300 Greater Hartford 154355 59417 0.987 0.986
#> 2 0900301 Greater Hartford 110423 44599 1 1
#> 3 0900302 Greater Hartford 121562 46879 1 1
#> 4 0900303 Greater Hartford 165411 66333 1 1
#> 5 0900304 Greater Hartford 43474 17149 0.382 0.381
#> 6 0900305 Greater Hartford 111643 44350 1 1
#> 7 0900306 Greater Hartford 119553 49100 1 1
#> 8 0901300 Greater Hartford 149188 56576 0.994 0.993
#> 9 0900902 Greater New Haven 24217 9503 0.187 0.179
#> 10 0900903 Greater New Haven 75564 27228 0.611 0.576
#> 11 0900904 Greater New Haven 122051 46820 1 1
#> 12 0900905 Greater New Haven 133874 50264 1 1
#> 13 0900906 Greater New Haven 109782 44014 1 1
#> 14 0900500 Greater Waterbury 93230 37275 0.503 0.498
#> 15 0900900 Greater Waterbury 132199 49909 1.00 0.999
#> 16 0900901 Greater Waterbury 113783 45114 1 1
#> 17 0900105 Lower Naugatuck Valley 40944 16204 0.232 0.251
#> 18 0900900 Lower Naugatuck Valley 50357 19577 0.381 0.392
#> 19 0900903 Lower Naugatuck Valley 48063 20077 0.389 0.424
# proxies made from COG-based PUMAs
proxy_pumas$cog
#> # A tibble: 54 × 6
#> puma region pop hh pop_weight hh_weight
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 0920703 Fairfield County 41206 15774 0.258 0.248
#> 2 0920801 Fairfield County 148470 55550 1 1
#> 3 0920802 Fairfield County 177911 61947 1 1
#> 4 0920901 Fairfield County 198911 76182 1 1
#> 5 0920902 Fairfield County 104825 36465 1 1
#> 6 0920903 Fairfield County 118282 44997 1 1
#> 7 0920904 Fairfield County 168766 61392 0.850 0.843
#> 8 0920201 Greater Hartford 121057 48277 1 1
#> 9 0920202 Greater Hartford 158475 62259 1 1
#> 10 0920203 Greater Hartford 116015 40349 1 1
#> # ℹ 44 more rows