class: left, title-slide # Developing in-house software ## Why I did it & you should too ### Camille Seaberry ### DataHaven ###
Follow along:
https://ct-data-haven.github.io/datadev
Get the code:
https://github.com/CT-Data-Haven/datadev
--- # DataHaven's Community Index .col3-pad[ #### 2013 data:image/s3,"s3://crabby-images/7de8b/7de8bb870cd747ff60e65714dd30a9711f21159c" alt="index1" ] .col3-pad[ #### 2016 data:image/s3,"s3://crabby-images/323fc/323fcc4219f7656a5f75a2f4a72d934a739305da" alt="index2" ] .col3-pad[ #### 2019: the takeover data:image/s3,"s3://crabby-images/4514d/4514d8940b84506a4e5a0ba4b95106a4cd4d8d68" alt="index3" ] --- # Spreadsheet sprawl .pull-left[data:image/s3,"s3://crabby-images/0ac46/0ac467b4024ba60d4536a501a9643ab5e3e344fc" alt="drake"] .pull-right[ ### Installing packages is easy ```r install_github("camille-s/camiller") install_github("CT-Data-Haven/cwi") ``` data:image/s3,"s3://crabby-images/447c7/447c7afe941744379eac916bd8669708f1e8f8e5" alt=":scale 50%" ] --- # Shifting my thinking: toward sustainable & reproducible work .pull-left[ .big-text[ Unhappy Mother's Day, but a new appreciation of behind-the-scenes information ] ] .pull-right[data:image/s3,"s3://crabby-images/76e92/76e929a51f5ad98d4d8e3d83f6b8a6a2e1833edd" alt=":scale 80%"] --- # Shifting my thinking: toward sustainable & reproducible work .pull-left[ data:image/s3,"s3://crabby-images/5ea98/5ea986bdbfae35b11ba75242486a5e5e5aa0c5f5" alt=""<!-- --> ] .pull-right[ .big-text[I moved 300 miles away to Baltimore] ] --- # What's a library? data:image/s3,"s3://crabby-images/4ae54/4ae5440fa956da1e235d7c47f5c9c0af10f7f603" alt=":scale 40%" -- ```r library(tidyverse) # sets up LOTS of functions, how I start my mornings library(tidycensus) # fetches data from Census API library(camiller) # first in-house library library(cwi) # second in-house library library(showtext) # use nice fonts in plots library(sf) # work with geospatial data & make maps library(patchwork) # layout plots together library(lubridate) # parse dates ``` --- # Functions! If only... .small-text[ ```r leave_the_house <- function(date = today(), biking = TRUE, working = TRUE) { day_of_week <- wday(date, label = TRUE, abbr = FALSE) always_need <- c("keys", "phone", "wallet", "meds") sometimes_need <- c() if (biking) { sometimes_need <- c(sometimes_need, "helmet") } else { sometimes_need <- c(sometimes_need, "bus card") } if (working) { sometimes_need <- c(sometimes_need, "laptop") } need <- c(always_need, sometimes_need) cat( sprintf("Happy %s! Today you need:", day_of_week), "\n", paste(need, collapse = ", ") ) } ``` ] --- # Functions! If only... ```r leave_the_house(biking = TRUE, working = FALSE) ``` ``` Happy Monday! Today you need: keys, phone, wallet, meds, helmet ``` --- # Functions: reduce repetition & clutter **Tedious and messy** ```r income_us <- get_acs("us", table = "B19013", year = 2017) income_state <- get_acs("state", table = "B19013", year = 2017) income_msa <- get_acs("metropolitan statistical area/micropolitan statistical area", table = "B19013", year = 2017) income_county <- get_acs("county", table = "B19013", state = "09", year = 2017) income_towns <- get_acs("county subdivision", table = "B19013", state = "09", year = 2017) income <- bind_rows(income_us, income_state, income_msa, income_county, income_towns) # get rid of those extra tables rm(income_us, income_state, income_msa, income_county, income_towns) ``` **Nice n clean** ```r income <- multi_geo_acs(table = "B19013", year = 2017, us = TRUE, msa = TRUE) ``` --- # Functions: I *swear* I did this last week! .left-column-67[ data:image/s3,"s3://crabby-images/30b2b/30b2bee7e587c0db42dde5d3e55a3e1d18d71dab" alt=""<!-- --> ] -- .right-column-33[ data:image/s3,"s3://crabby-images/dbf45/dbf4593d86e3b763cc4ef286d58138452f927725" alt=""<!-- --> ] -- .right-column-33[ data:image/s3,"s3://crabby-images/bf0dd/bf0ddeb647f6c6f017cacac7267842825f17bdbc" alt=""<!-- --> ] --- # Functions: make it scale .left-column-67[ data:image/s3,"s3://crabby-images/9d623/9d6230216636f4cd117943df35543abd6986b43b" alt=""<!-- --> ] -- .right-column-33[ data:image/s3,"s3://crabby-images/6fbad/6fbad7f02bac019aa25e5acb8bb24ce64437fe7c" alt=""<!-- --> ] -- .right-column-33[ data:image/s3,"s3://crabby-images/0cca9/0cca998e0fcc60308a7aa2c1fdad8fbabe2ec33c" alt=""<!-- --> ] --- # Functions: encourage good habits .pull-left[ ```r geo_level_plot(tenure, value = homeownership, hilite = "mediumpurple1", title = "Homeownership rates, 2017") ``` ] .pull-right[ data:image/s3,"s3://crabby-images/a6a46/a6a46b296594d6a24ea9c8e0b29db2bd8f2a7b0b" alt=""<!-- --> ] --- # Clean, uniform charts data:image/s3,"s3://crabby-images/c6ccc/c6ccc223933be076df32f215af609eff40001c0d" alt=""<!-- --> --- # Reusable datasets & references .big-text[How many times can I generate, save, and forget about the same lookup tables and shapefiles?] data:image/s3,"s3://crabby-images/e4e98/e4e982433985830478ff9ff319cfc32b0cde42aa" alt="references" --- # Reusable datasets & references **Much better:** move those lookup tables & shapefiles to the R package .pull-left[ ```r head(village2town, n = 5) ``` <table> <thead> <tr> <th style="text-align:left;"> cdp_geoid </th> <th style="text-align:left;"> place </th> <th style="text-align:left;"> town_geoid </th> <th style="text-align:left;"> town </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 0902550 </td> <td style="text-align:left;"> Baltic </td> <td style="text-align:left;"> 0901171670 </td> <td style="text-align:left;"> Sprague </td> </tr> <tr> <td style="text-align:left;"> 0902690 </td> <td style="text-align:left;"> Bantam </td> <td style="text-align:left;"> 0900543370 </td> <td style="text-align:left;"> Litchfield </td> </tr> <tr> <td style="text-align:left;"> 0904945 </td> <td style="text-align:left;"> Bethlehem Village </td> <td style="text-align:left;"> 0900504930 </td> <td style="text-align:left;"> Bethlehem </td> </tr> <tr> <td style="text-align:left;"> 0906050 </td> <td style="text-align:left;"> Blue Hills </td> <td style="text-align:left;"> 0900305910 </td> <td style="text-align:left;"> Bloomfield </td> </tr> <tr> <td style="text-align:left;"> 0907345 </td> <td style="text-align:left;"> Branford Center </td> <td style="text-align:left;"> 0900907310 </td> <td style="text-align:left;"> Branford </td> </tr> </tbody> </table> ] .pull-right[ ```r plot(new_haven_sf["geometry"]) ``` data:image/s3,"s3://crabby-images/d8470/d847043285726fcefab071363bb0a45ede8b9a52" alt=""<!-- --> ] --- # Reusable datasets & references Avoid the suffering of finding table numbers on FactFinder data:image/s3,"s3://crabby-images/20a3b/20a3bba241f4bf00750929b5241b935583d280a9" alt="factfinder" --- # Reusable datasets & references Avoid the suffering of finding table numbers on FactFinder ```r basic_table_nums[["pov_age"]] ``` ``` ## [1] "B17024" ``` ```r get_acs("county", table = basic_table_nums[["pov_age"]], state = "09") ``` --- # Testing, debugging, documenting ### What doesn't kill you makes you stronger * Does this function do what I *think* it does? * Are these the most important tasks for me & my coworkers? * What might break by this time next month? * How will this scale & remain relevant? * What am I not thinking of yet? #### Testing the `qwi_industry` function in `cwi`: .small-text[ ```r test_that("handles years not in API", { expect_warning(qwi_industry(1990:2000, industries = "23"), "earlier years are being removed") expect_error(qwi_industry(1990:1994, industries = "23"), "only available") # should only return 1996-2000 expect_equal(nrow(suppressWarnings(qwi_industry(1991:2000, industries = "23", annual = T))), 5) }) ``` ] --- # Testing, debugging, documenting ### What doesn't kill you makes you stronger .pull-left[ * My code is amazing. Now how do I make sure someone uses it? * If I can't explain a feature, do I really need it? * What might someone else do wrong? * How can I avoid "What does this do?" emails and texts? ] .pull-right[ #### Docs website with `pkgdown` <iframe src="https://ct-data-haven.github.io/cwi/articles/basic-workflow.html" style="border:0px #ffffff solid;" name="cwidocs" scrolling="yes" frameborder="1" marginheight="0px" marginwidth="0px" height="400px" width="800px" allowfullscreen></iframe> ] --- background-image: url(img/sketchbook.jpg) background-size: cover .white-bkgnd[ # tl;dr ## Package development: lots of work upfront, totally worth it
**DataHaven:** [ctdatahaven.org](http://ctdatahaven.org/)
**Our side projects blog:** [ct-data-haven.github.io](http://ct-data-haven.github.io/)
**DataHaven on GitHub:** [github.com/CT-Data-Haven](https://github.com/CT-Data-Haven)
**These very slides!** [ct-data-haven.github.io/datadev](https://ct-data-haven.github.io/datadev) ]