• Fixed neighborhood shapefiles. These now come directly from cities’ data portals in the scratchpad repo, where they’re published as a release, giving us a single source of truth for what those boundaries are. There were also errors where a few neighborhoods in Hartford / West Hartford and Stamford received tracts outside the city boundaries. As a result, weights tables have changed a fair amount. We’d also used our own combinations of Stamford neighborhoods but now have those from the city, with some shifts in what neighborhoods are lumped together and how they’re labeled.
  • Updated and improved methods for making zip2town crosswalk, based on 2020 / 2022 geographies. The columns included in the data frame are slightly changed.
  • Fixed issues with qwi_industry: the API now uses COGs for Connecticut instead of counties.

Moved add_logo to the stylehaven package.

Bumping package versions just to draw attention to the fact that there’s now a set of PUMA proxy crosswalks; see proxy_pumas.

Edit xwalk—there were still more FIPS codes to update with their COG-based versions. The data frame now includes COG-based codes for block groups, tracts, towns, and PUMAs.

Bump ACS-related defaults to 2022

MINOR BREAKING CHANGE: This update corresponds to the 2022 ACS data release, which is the first to use COGs instead of counties. Because COGs have different FIPS codes, town and tract FIPS codes (but apparently not block groups) have changed to match. The bulk of their code digits stay the same, but the portion signifying the county changed, e.g. 09009140101 is now 09170140101. To deal with that without breaking too much code, there are a few changes to the package:

  • Neighborhood lookup tables (bridgeport_tracts, etc) have the previous county-based FIPS codes in the column geoid, and the new COG-based FIPS codes in the column geoid_cog.
  • xwalk now has columns for COG-based town and tract FIPS codes, in addition to the previous county-based ones.
  • Calling multi_geo_acs with counties = "all" (the default) will get you COGs, but multi_geo_decennial will get you counties, because the switch was not retroactive.
  • The names of COGs returned by multi_geo_acs and used for names in the regions list are the ones the Census Bureau uses. Unfortunately, these aren’t all the ones the state uses. For that, I’ve added a function fix_cogs, which replaces common names for them with the ones the state lists somewhat officially, e.g. Capitol COG is in the census data, Capitol Region COG is what the state usually uses but probably not always.
  • Finally, the part that doesn’t come up often but will break: previously the multi_geo_* functions took neighborhood names, weights, and GEOIDs as bare column names, with defaults (name, weight, and geoid, respectively). These now have to all be given as strings (i.e. in quotation marks), and geoid no longer has a default. This is to deal with the fact that some calculations will now need the neighborhood lookup tables’ geoid columns, and some will need geoid_cog. This only matters when you’re including neighborhoods in function calls.
  • The 2020 decennial census added a few dozen new census designated places, which is what village2town is based on. They now overlap with towns even less well than they used to. The table has been recalculated, with towns and villages joined based on overlapping population from the 2020 decennial, and now includes populations and weights in the crosswalk. That means things could break if you’re expecting one set of CDPs and get another, or if you’re not expecting new columns in that table.
  • MINOR BREAKING CHANGE: multi_geo_decennial now defaults to 2020. Because the 2020 decennial uses a different summary file code from previous years, the default sumfile argument, if used with 2010, will lead to an error.
  • 2020 decennial variables are now available in decennial_vars20. The 2010 ones are still in decennial_vars10.
  • A new data frame, cb_avail, has the years, programs (ACS vs decennial), and dataset codes (SF1, ACS5, DHC, etc.) available from the Census Bureau’s API.
  • The function dh_scaffold was poorly named and not a great fit for the aims of this project. It’s been moved to {stylehaven}; find it there as scaffold_project.
  • Minor improvements to some warnings and other messages.
  • Add COGs to xwalk along with function for reconciling names
  • Update defaults to 2021 where applicable: multi_geo_acs, adj_inflation base year, label_acs.
  • Replace acs_vars20 with acs_vars21.
  • multi_geo_decennial now takes "pl" as a possible value for summary file, since the full 2020 Decennial data still aren’t out.
  • Add regional councils of governments to regions list. Connecticut adopted these in 2022 to replace counties. Definitions from CTOPM here.
  • Add vignette on regions since there’s so many of them now
  • Start handling updated MSA definitions—not sure that any datasets actually use these yet
  • Add rescale option to sub_nonanswers—its default won’t change any existing code

Bugfix: occupational codes have larger groups and smaller groups. One larger group (Healthcare Practitioners and Technical Occupations) was mislabeled so it was marked as being under Education, Legal, Community Service, Arts, and Media Occupations.

Some updates to 2020

  • 2020 ACS 5-year data are finally out, so acs_vars19 has been replaced by acs_vars20, and multi_geo_acs now uses 2020 as the default. Some examples & vignette code have been updated to match.
  • Decennial census data aren’t out yet and won’t be for some time, so decennial-related things still default to 2010.

Major exciting overhaul! This was the first time I felt like enough of this package is flexible and well thought out to consider it a real release. A lot of the changes are under the hood–I split a lot of functions into slimmed-down main “caller” functions and multiple task-focused “helper” functions, making it easier to maintain the package, add or modify features, and use the same code for multiple tasks.

User-facing updates

  • Moved from base messages to cli for cleaner and clearer messaging (printouts on what fetch functions are getting, limitations to function calls, etc)
  • Better handling of Census API calls to better deal with how very often their servers are busted
  • Metadata: several behind-the-scenes datasets that set limits of functions’ API calls are now expanded to not just be limited to Connecticut–includes qwi_industry and laus_trend.
  • Added a table of occupation codes for main occupation groups
  • Better documentation for many functions

Breaking changes

  • I’ve never liked the levels for the multi_geo_* functions–I don’t really remember why I made these plural, but they’re now singular. So a column that would have been e.g. “1_state”, “2_counties”, “3_towns” will now be “1_state”, “2_county”, “3_town”. This might break filtering you’ve done by level.
  • Renamed one function: acs_quick_map –> quick_map

To do

  • Update to 2020 ACS and Decennial defaults
  • Add sleep argument to multi_geo_acs for dealing with API crashes.
  • Add handling for reading crosstab weights placed in headers alongside data rather than in a separate table (e.g. for 2021).

Since the 2020 ACS is delayed, I decided we should still have copies of 2019 geography-related files. This should be temporary, but for now there are 2 versions of the tract shapefile (tract_sf and tract_sf19), and 2 versions of each neighborhood-tract weight table (e.g. new_haven_tracts and new_haven_tracts19, and so on). Once all the data is out, I’ll remove the 2019 versions and bump up the package version.

  • Update tract_sf and town_sf to 2020 boundaries. Don’t expect anything should have changed for towns, but many tracts were added after the Census Bureau released redistricting data.
  • Handle typos in some crosstabs.
  • Rewrote neighborhood weights with the 2020 redistricting block boundaries. Dropped the block group table that was only done for New Haven, and changed the name of nhv_tracts to new_haven_tracts to match those for other cities.
  • QWI API is working again, but payroll data is missing from their database.
  • Minor behind-the-scenes updates
  • QWI example in the basic workflow vignette is currently turned off, because the Census QWI API has been down for at least a few days now. Will turn it back on when the API is (hopefully) back online.
  • New function: Add a function separate_acs as a very lazy way to split ACS labels.
  • Added finished versions of read_xtab, read_weights, xtab2df, and collapse_n_wt for working with DataHaven Community Wellbeing Survey crosstabs—see vignettes
  • Added add_logo with built-in DataHaven logo
  • Bug fixes in sub_nonanswers, xwalk
  • multi_geo_acs & multi_geo_decennial call janitor::clean_names before returning. This keeps columns aligned properly if neighborhoods are included.
  • Expanded xwalk data to include more geographic levels.
  • Minor vignette cleanup.
  • Added a NEWS.md file to track changes to the package.
  • Functions that make use of API keys have explicit key arguments so Census and BLS API keys don’t have to be stored in specific environment variables, though they’ll still default to those same environment variables.
  • Installation should be easier and have less overhead, because there are now fewer dependencies.
  • Fixed bugs with BLS API in adj_inflation.
  • Both multi_geo_acs and multi_geo_decennial can aggregate neighborhood data. There’s an example in the workflow vignette.
  • Should now be up to date with newer dplyr 1.0.0 & tidyr 1.0.0 functions.
  • New functions: jenks, dh_scaffold