class: title-slide, right, top background-image: url(data:image/png;base64,#img/moon.png) background-position: 90% 75%, 75% 75% background-size:cover .left-column[ # NHS Workshop<br>Introduction to ggplot ] .right-column[ ### Getting starting - why ggplot? **Eugene Hickey**<br> January 21st 2021 ] .palegrey[.left[.footnote[Graphic by [Elaine Hickey](https://photos.google.com/photo/AF1QipMjKNoaxyne8nte4HmxA6Th9-4fUfSbl_mx-_1G)]]] ??? Welcome to the workshop on ggplot. Where we'll show you how to create impressive data visualisations. --- name: about-me layout: false class: about-me-slide, inverse, middle, center # About me <img style="border-radius: 50%;" src="data:image/png;base64,#img/eugene.jpg" width="150px"/> ## Eugene Hickey ### lecturer in physics .fade[Technological University<br>Dublin] [
bioscience.netlify.app](https://bioscience.netlify.app) [
@eugene100hickey](https://twitter.com/eugene100hickey) [
eugene100hickey](https://github.com/eugene100hickey) --- layout: true <a class="footer-link" href="http://intro-ggplot-nhs.netlify.app">intro-ggplot-nhs — Eugene Hickey</a> <!-- this adds the link footer to all slides, depends on footer-link class in css--> --- class: top # Acknowledgments .pull-left-narrow[.center[<img style="border-radius: 50%;" src="data:image/png;base64,#https://www.strategyunitwm.nhs.uk/sites/default/files/styles/banner/public/Sharon_Townsend.jpg?itok=_S1ntVbo" width="100px"/>]] .pull-right-wide[ [Sharon Townsend](https://www.strategyunitwm.nhs.uk/author/sharon-townsend), co-pilot for this workshop and business manager at the NHS, Midlands and Lancashire Strategy Unit. ] -- .pull-left-narrow[.center[ <img style="border-radius: 50%;" src="data:image/png;base64,#https://avatars0.githubusercontent.com/u/53170984?s=200&v=4" width="125px"/>]] .pull-right-wide[ [NHS-R Community](https://nhsrcommunity.com/) for the opportunity to provide this workshop - [NHSRdatasets 📦](https://github.com/nhs-r-community/NHSRdatasets) developed by Chris Mainey and Tom Jemmett ] -- .pull-left-narrow[.center[
]] .pull-right-wide[ - [xaringan 📦](https://github.com/yihui/xaringan#xaringan) developed by Yihui Xie - [flipbookr 📦](https://github.com/EvaMaeRey/flipbookr) developed by Gina Reynolds - [learnr 📦](https://github.com/rstudio/learnr) developed by Garrick Aden-Buie ] --- # Target Audience - people with some experience of R, but haven't made the plunge to ggplot. __Just yet__. - do data analysis in R and provide visualisations using base graphics, or something else. --- ## Why We're Here - Alternative to base graphics, Excel, and Tableau - Enables Reproducible Research - Can Make Lots of Plots Quickly - Good for Exploratory Analysis - Publication Ready Figures ## And.... a gateway to so much more - data capture - statistical analysis - machine learning - artificial intelligence - writing a book - writing a blog --- ## Not Why We're Here - Won't discuss choices for data presentation - Nor good practices in visualisations - but these are sort of in the background - This isn't a machine learning course - but lots of the techniques we'll use are relevant - So, this course it about skills development, how you use these is up to you. --- ## We said we wouldn't discuss this....but - Graphics are important, overlooked, and inconsistent - Need to tell a story - Can be misleading, almost always by accident - Choice of colours - we'll spend some time on this - Choice of fonts - Keep it simple - reduce amount of ink - Increasing number of options for showcasing your data --- # Why ggplot rather than base? - while some plots can be easier to produce using base graphics .pull-left[ ```r hist(LOS_model$Age) ``` <img src="data:image/png;base64,#01-why-ggplot_files/figure-html/base_hist-1.png" width="80%" /> ] .pull-right[ ```r ggplot(data = LOS_model, aes(Age)) + geom_histogram(bins = 10) ``` <img src="data:image/png;base64,#01-why-ggplot_files/figure-html/ggplot_hist-1.png" width="80%" /> ] --- # Why ggplot? - anything moderately complicated is better in ggplot (from [David Robinson](http://varianceexplained.org/r/why-I-use-ggplot2/)) .pull-left[ ```r par(mar = c(1.5, 1.5, 1.5, 1.5)) colors <- 1:6 names(colors) <- unique(top_data$nutrient) # legend approach from http://stackoverflow.com/a/10391001/712603 m <- matrix(c(1:18), nrow = 6, ncol = 3, byrow = TRUE) layout(mat = m, heights = c(.18, .18, .18, .18, .18, .1)) for (gene in unique(top_data$combined)) { sub_data <- filter(top_data, combined == gene) plot(expression ~ rate, sub_data, col = colors[sub_data$nutrient], main = gene) for (n in unique(sub_data$nutrient)) { m <- lm(expression ~ rate, filter(sub_data, nutrient == n)) if (!is.na(m$coefficients[2])) { abline(m, col = colors[n]) } } } # create a new plot for legend # plot(1, type = "n", axes = FALSE, xlab = "", ylab = "") # legend("top", names(colors), col = colors, horiz = TRUE, lwd = 4) ``` ] .pull-right[ ![](data:image/png;base64,#01-why-ggplot_files/figure-html/baseplot-label-out-1.png)<!-- --> ] --- # Why ggplot? - anything moderately complicated is better in ggplot .pull-left[ ```r ggplot(top_data, aes(rate, expression, color = nutrient)) + geom_point(show.legend = FALSE, ) + geom_smooth(method = "lm", se = FALSE, show.legend = FALSE, size = 0.5) + facet_wrap(~combined, scales = "free_y", nrow = 3) ``` ] .pull-right[ ![](data:image/png;base64,#01-why-ggplot_files/figure-html/ggplot-label-out-1.png)<!-- --> ] --- # Lots of addin packages for ggplot gg.gap, ggallin, ggalluvial, ggalt, ggamma, gganimate, ggasym, ggbeeswarm, ggBubbles, ggbuildr, ggbump, ggcharts, ggChernoff, ggconf, ggcorrplot, ggdag, ggdark, ggDCA, ggdemetra, ggdendro, ggdist, ggdmc, gge, ggeasy, ggedit, ggeffects, ggenealogy, ggetho, ggExtra, ggfan, ggfittext, ggfocus, ggforce, ggformula, ggfortify, gggap, gggenes, ggghost, gggibbous, ggguitar, gghalfnorm, gghalves, gghighlight, ggimage, ggimg, gginference, gginnards, ggip, ggiraph, ggiraphExtra, ggjoy, gglasso, gglm, gglogo, ggloop, gglorenz, ggm, ggmap, ggmcmc, ggmix, ggmosaic, ggmr, ggmsa, ggmuller, ggmulti, ggnetwork, ggnewscale, ggnormalviolin, ggnuplot, ggpacman, ggpage, ggparallel, ggparliament, ggparty, ggperiodic, ggplot.multistats, ggplot2, ggplot2movies, ggplotAssist, ggplotgui, ggplotify, ggplotlyExtra, ggpmisc, ggPMX, ggpointdensity, ggpol, ggpolypath, ggpubr, ggpval, ggQC, ggQQunif, ggquickeda, ggquiver, ggRandomForests, ggraph, ggraptR, ggrasp, ggrastr, ggrepel, ggResidpanel, ggridges, ggrisk, ggROC, ggroups, ggsci, ggseas, ggseqlogo, ggsignif, ggsn, ggsoccer, ggsolvencyii, ggsom, ggspatial, ggspectra, ggstance, ggstar, ggstatsplot, ggstudent, ggswissmaps, ggtern, ggtext, ggThemeAssist, ggthemes, ggTimeSeries, ggupset, ggVennDiagram, ggversa, ggvis, ggvoronoi, ggwordcloud --- # And others, that make ggplots that can then be modified and treated as such .pull-left[ ```r fviz_cluster_example ``` ![](data:image/png;base64,#01-why-ggplot_files/figure-html/unnamed-chunk-4-1.png)<!-- --> ] .pull-right[ ```r fviz_cluster_example + theme_classic() ``` ![](data:image/png;base64,#01-why-ggplot_files/figure-html/unnamed-chunk-5-1.png)<!-- --> ] --- # Other reasons - ggplot is easy to make publication-ready - easier to make sequence of visualisations - fits in nicely with the rest of the tidyverse --- # Resources - [Big Book of R](https://www.bigbookofr.com/index.html) - books - *recommended text* **Data Visualization** by Kieran Healy (ISBN = 978-0691181622). ~€25. Also online at [https://socviz.co/index.html](https://socviz.co/index.html) - [Hadley's book, R for Data Science](https://r4ds.had.co.nz/) - [Hadley's book on ggplot2](https://ggplot2-book.org/) - [Data Visualization by Wilke](https://serialmentor.com/dataviz/), lots of his actual code is on github at [https://github.com/clauswilke/practical_ggplot2](https://github.com/clauswilke/practical_ggplot2) - check out the list of online books at [bookdown.org](bookdown.org) <img src="data:image/png;base64,#img/hadley.jpeg" height="100px" width="100px" align="right"/> --- - websites - Karl Broman (https://www.biostat.wisc.edu/~kbroman/), and particularly [this presentation](https://www.biostat.wisc.edu/~kbroman/presentations/graphs_MDPhD2014.pdf) - course by Boemhke on github [github.com/uc-r/Intro-R](https://github.com/uc-r/Intro-R) - the good people at RStudio have lots of help at [resources.rstudio.com/](https://resources.rstudio.com/) - [Cedric](https://cedricscherer.netlify.com/2019/08/05/a-ggplot2-tutorial-for-beautiful-plotting-in-r/). - [The R Graph Gallery](https://www.r-graph-gallery.com/index.html) is pretty good and worth checking out <br/> <br/> <br/> <img src="data:image/png;base64,#https://github.com/yihui/xaringan/releases/download/v0.0.2/karl-moustache.jpg" height="80px" width="100px" align="right"/> --- - Some stuff about graphics in general - [Rafael Irizarry - plots to avoid](http://genomicsclass.github.io/book/pages/plots_to_avoid.html) - [hit parade of graphs in R](https://www.r-graph-gallery.com/index.html) - [Cedric Scherer again](https://cedricscherer.netlify.com/) - some stuff from [Christian Burkhard](https://ggplot2tor.com/make_any_plot_look_better/make_any_plot_look_better/) - and from [Laura Ellis](https://www.littlemissdata.com/) - and from [Peter Aldhous](http://paldhous.github.io/ucb/2016/dataviz/) - [colours in R](https://www.nceas.ucsb.edu/~frazier/RSpatialGuides/colorPaletteCheatsheet.pdf) - cool book on good graphics from [Stephen Few](https://nces.ed.gov/programs/slds/pdf/08_F_06.pdf) - [The Glamour of Graphics](https://www.williamrchase.com/slides/assets/player/KeynoteDHTMLPlayer.html#0) talk from last years RStudio Conference (the [2021 version](https://rstudio.com/conference/) starts this evening) <img src="data:image/png;base64,#img/rafael.jpg" height="100px" width="100px" align="right"/>