class: left, title-slide # Developing in-house software ## Why I did it & you should too ### Camille Seaberry ### DataHaven ###
Follow along:
https://ct-data-haven.github.io/datadev
Get the code:
https://github.com/CT-Data-Haven/datadev
--- # DataHaven's Community Index .col3-pad[ #### 2013 ![index1](img/index2013.jpg) ] .col3-pad[ #### 2016 ![index2](img/index2016.jpg) ] .col3-pad[ #### 2019: the takeover ![index3](img/community_index_heatmap.png) ] --- # Spreadsheet sprawl .pull-left[![drake](img/drake.jpg)] .pull-right[ ### Installing packages is easy ```r install_github("camille-s/camiller") install_github("CT-Data-Haven/cwi") ``` ![:scale 50%](img/easy.png) ] --- # Shifting my thinking: toward sustainable & reproducible work .pull-left[ .big-text[ Unhappy Mother's Day, but a new appreciation of behind-the-scenes information ] ] .pull-right[![:scale 80%](img/mom.jpg)] --- # Shifting my thinking: toward sustainable & reproducible work .pull-left[ ![](index_files/figure-html/move_map-1.png)<!-- --> ] .pull-right[ .big-text[I moved 300 miles away to Baltimore] ] --- # What's a library? ![:scale 40%](img/arthur-library.jpg) -- ```r library(tidyverse) # sets up LOTS of functions, how I start my mornings library(tidycensus) # fetches data from Census API library(camiller) # first in-house library library(cwi) # second in-house library library(showtext) # use nice fonts in plots library(sf) # work with geospatial data & make maps library(patchwork) # layout plots together library(lubridate) # parse dates ``` --- # Functions! If only... .small-text[ ```r leave_the_house <- function(date = today(), biking = TRUE, working = TRUE) { day_of_week <- wday(date, label = TRUE, abbr = FALSE) always_need <- c("keys", "phone", "wallet", "meds") sometimes_need <- c() if (biking) { sometimes_need <- c(sometimes_need, "helmet") } else { sometimes_need <- c(sometimes_need, "bus card") } if (working) { sometimes_need <- c(sometimes_need, "laptop") } need <- c(always_need, sometimes_need) cat( sprintf("Happy %s! Today you need:", day_of_week), "\n", paste(need, collapse = ", ") ) } ``` ] --- # Functions! If only... ```r leave_the_house(biking = TRUE, working = FALSE) ``` ``` Happy Monday! Today you need: keys, phone, wallet, meds, helmet ``` --- # Functions: reduce repetition & clutter **Tedious and messy** ```r income_us <- get_acs("us", table = "B19013", year = 2017) income_state <- get_acs("state", table = "B19013", year = 2017) income_msa <- get_acs("metropolitan statistical area/micropolitan statistical area", table = "B19013", year = 2017) income_county <- get_acs("county", table = "B19013", state = "09", year = 2017) income_towns <- get_acs("county subdivision", table = "B19013", state = "09", year = 2017) income <- bind_rows(income_us, income_state, income_msa, income_county, income_towns) # get rid of those extra tables rm(income_us, income_state, income_msa, income_county, income_towns) ``` **Nice n clean** ```r income <- multi_geo_acs(table = "B19013", year = 2017, us = TRUE, msa = TRUE) ``` --- # Functions: I *swear* I did this last week! .left-column-67[ ![](index_files/figure-html/map_nobrks1-1.png)<!-- --> ] -- .right-column-33[ ![](index_files/figure-html/map_brks1-1.png)<!-- --> ] -- .right-column-33[ ![](index_files/figure-html/map_brks2-1.png)<!-- --> ] --- # Functions: make it scale .left-column-67[ ![](index_files/figure-html/map_nobrks2-1.png)<!-- --> ] -- .right-column-33[ ![](index_files/figure-html/map_brks3-1.png)<!-- --> ] -- .right-column-33[ ![](index_files/figure-html/map_brks4-1.png)<!-- --> ] --- # Functions: encourage good habits .pull-left[ ```r geo_level_plot(tenure, value = homeownership, hilite = "mediumpurple1", title = "Homeownership rates, 2017") ``` ] .pull-right[ ![](index_files/figure-html/tenure_plot-1.png)<!-- --> ] --- # Clean, uniform charts ![](index_files/figure-html/plot_pair-1.png)<!-- --> --- # Reusable datasets & references .big-text[How many times can I generate, save, and forget about the same lookup tables and shapefiles?] ![references](img/too_many_references.png) --- # Reusable datasets & references **Much better:** move those lookup tables & shapefiles to the R package .pull-left[ ```r head(village2town, n = 5) ``` <table> <thead> <tr> <th style="text-align:left;"> cdp_geoid </th> <th style="text-align:left;"> place </th> <th style="text-align:left;"> town_geoid </th> <th style="text-align:left;"> town </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 0902550 </td> <td style="text-align:left;"> Baltic </td> <td style="text-align:left;"> 0901171670 </td> <td style="text-align:left;"> Sprague </td> </tr> <tr> <td style="text-align:left;"> 0902690 </td> <td style="text-align:left;"> Bantam </td> <td style="text-align:left;"> 0900543370 </td> <td style="text-align:left;"> Litchfield </td> </tr> <tr> <td style="text-align:left;"> 0904945 </td> <td style="text-align:left;"> Bethlehem Village </td> <td style="text-align:left;"> 0900504930 </td> <td style="text-align:left;"> Bethlehem </td> </tr> <tr> <td style="text-align:left;"> 0906050 </td> <td style="text-align:left;"> Blue Hills </td> <td style="text-align:left;"> 0900305910 </td> <td style="text-align:left;"> Bloomfield </td> </tr> <tr> <td style="text-align:left;"> 0907345 </td> <td style="text-align:left;"> Branford Center </td> <td style="text-align:left;"> 0900907310 </td> <td style="text-align:left;"> Branford </td> </tr> </tbody> </table> ] .pull-right[ ```r plot(new_haven_sf["geometry"]) ``` ![](index_files/figure-html/nhv_basic_map-1.png)<!-- --> ] --- # Reusable datasets & references Avoid the suffering of finding table numbers on FactFinder ![factfinder](img/factfinder_pov_by_age.png) --- # Reusable datasets & references Avoid the suffering of finding table numbers on FactFinder ```r basic_table_nums[["pov_age"]] ``` ``` ## [1] "B17024" ``` ```r get_acs("county", table = basic_table_nums[["pov_age"]], state = "09") ``` --- # Testing, debugging, documenting ### What doesn't kill you makes you stronger * Does this function do what I *think* it does? * Are these the most important tasks for me & my coworkers? * What might break by this time next month? * How will this scale & remain relevant? * What am I not thinking of yet? #### Testing the `qwi_industry` function in `cwi`: .small-text[ ```r test_that("handles years not in API", { expect_warning(qwi_industry(1990:2000, industries = "23"), "earlier years are being removed") expect_error(qwi_industry(1990:1994, industries = "23"), "only available") # should only return 1996-2000 expect_equal(nrow(suppressWarnings(qwi_industry(1991:2000, industries = "23", annual = T))), 5) }) ``` ] --- # Testing, debugging, documenting ### What doesn't kill you makes you stronger .pull-left[ * My code is amazing. Now how do I make sure someone uses it? * If I can't explain a feature, do I really need it? * What might someone else do wrong? * How can I avoid "What does this do?" emails and texts? ] .pull-right[ #### Docs website with `pkgdown` <iframe src="https://ct-data-haven.github.io/cwi/articles/basic-workflow.html" style="border:0px #ffffff solid;" name="cwidocs" scrolling="yes" frameborder="1" marginheight="0px" marginwidth="0px" height="400px" width="800px" allowfullscreen></iframe> ] --- background-image: url(img/sketchbook.jpg) background-size: cover .white-bkgnd[ # tl;dr ## Package development: lots of work upfront, totally worth it
**DataHaven:** [ctdatahaven.org](http://ctdatahaven.org/)
**Our side projects blog:** [ct-data-haven.github.io](http://ct-data-haven.github.io/)
**DataHaven on GitHub:** [github.com/CT-Data-Haven](https://github.com/CT-Data-Haven)
**These very slides!** [ct-data-haven.github.io/datadev](https://ct-data-haven.github.io/datadev) ]