+ - 0:00:00
Notes for current slide

Welcome to the workshop on ggplot.

Where we'll show you how to create impressive data visualisations.

Notes for next slide

NHS Workshop
Introduction to ggplot

ggplot - Distributions/Relationships

Eugene Hickey
January 21st 2021

Graphic by Elaine Hickey

Welcome to the workshop on ggplot.

Where we'll show you how to create impressive data visualisations.

Picturing Data Different Ways with ggplot

We're going to set out some of the options for looking at data

these depend on what kind of data you have

and what you want to investigate

Lots of these come from Top 50 Visualizations in R

  1. Visualising Amounts
  2. Visualising Proportions
  3. Visualising Distributions
  4. Visualising Relationships
  5. Visualising Time Series
  6. Visualising Groups
  7. Visualising Networks
  8. Visualising Spatial Data

    Items in red we'll cover this afternoon. In blue will have to wait for a future workshop.

Visualising Distributions

  • histograms
  • density plots
  • boxplot
  • violin plot
  • ridge plots
basketball
## # A tibble: 3,366 x 8
## name year_start year_end position height weight birth_date college
## <chr> <dbl> <dbl> <chr> <dbl> <dbl> <chr> <chr>
## 1 Kareem ~ 1970 1989 C 218. 102. April 16, 1~ University ~
## 2 Mahmoud~ 1991 2001 G 185. 73.5 March 9, 19~ Louisiana S~
## 3 Tariq A~ 1998 2003 F 198. 101. November 3,~ San Jose St~
## 4 Shareef~ 1997 2008 F 206. 102. December 11~ University ~
## 5 Tom Abe~ 1977 1981 F 201. 99.8 May 6, 1954 Indiana Uni~
## 6 Forest ~ 1957 1957 G 190. 81.6 July 27, 19~ Western Ken~
## 7 John Ab~ 1947 1948 F 190. 88.5 February 9,~ Salem Inter~
## 8 Alex Ac~ 2006 2009 G 196. 83.9 January 21,~ Pepperdine ~
## 9 Don Ack~ 1954 1954 G 183. 83.0 September 4~ Long Island~
## 10 Bud Act~ 1968 1968 F 198. 95.3 January 11,~ Hillsdale C~
## # ... with 3,356 more rows
basketball %>%
ggplot(aes(weight))

basketball %>%
ggplot(aes(weight)) +
geom_histogram(fill = "firebrick4",
bins = 50)

basketball %>%
ggplot(aes(weight)) +
geom_histogram(fill = "firebrick4",
bins = 50) +
labs(x = "weight (kg)",
y = "",
caption = "@Data from Kaggle",
title = "Weight of NBA Players")

basketball
## # A tibble: 3,366 x 8
## name year_start year_end position height weight birth_date college
## <chr> <dbl> <dbl> <chr> <dbl> <dbl> <chr> <chr>
## 1 Kareem ~ 1970 1989 C 218. 102. April 16, 1~ University ~
## 2 Mahmoud~ 1991 2001 G 185. 73.5 March 9, 19~ Louisiana S~
## 3 Tariq A~ 1998 2003 F 198. 101. November 3,~ San Jose St~
## 4 Shareef~ 1997 2008 F 206. 102. December 11~ University ~
## 5 Tom Abe~ 1977 1981 F 201. 99.8 May 6, 1954 Indiana Uni~
## 6 Forest ~ 1957 1957 G 190. 81.6 July 27, 19~ Western Ken~
## 7 John Ab~ 1947 1948 F 190. 88.5 February 9,~ Salem Inter~
## 8 Alex Ac~ 2006 2009 G 196. 83.9 January 21,~ Pepperdine ~
## 9 Don Ack~ 1954 1954 G 183. 83.0 September 4~ Long Island~
## 10 Bud Act~ 1968 1968 F 198. 95.3 January 11,~ Hillsdale C~
## # ... with 3,356 more rows
basketball %>%
ggplot(aes(weight,
fill = position))

basketball %>%
ggplot(aes(weight,
fill = position)) +
geom_histogram(bins = 20,
position = "dodge")

basketball %>%
ggplot(aes(weight,
fill = position)) +
geom_histogram(bins = 20,
position = "dodge") +
labs(x = "weight (kg)",
y = "",
caption = "@Data from Kaggle",
title = "Weight of NBA Players by Position")

basketball
## # A tibble: 3,366 x 8
## name year_start year_end position height weight birth_date college
## <chr> <dbl> <dbl> <chr> <dbl> <dbl> <chr> <chr>
## 1 Kareem ~ 1970 1989 C 218. 102. April 16, 1~ University ~
## 2 Mahmoud~ 1991 2001 G 185. 73.5 March 9, 19~ Louisiana S~
## 3 Tariq A~ 1998 2003 F 198. 101. November 3,~ San Jose St~
## 4 Shareef~ 1997 2008 F 206. 102. December 11~ University ~
## 5 Tom Abe~ 1977 1981 F 201. 99.8 May 6, 1954 Indiana Uni~
## 6 Forest ~ 1957 1957 G 190. 81.6 July 27, 19~ Western Ken~
## 7 John Ab~ 1947 1948 F 190. 88.5 February 9,~ Salem Inter~
## 8 Alex Ac~ 2006 2009 G 196. 83.9 January 21,~ Pepperdine ~
## 9 Don Ack~ 1954 1954 G 183. 83.0 September 4~ Long Island~
## 10 Bud Act~ 1968 1968 F 198. 95.3 January 11,~ Hillsdale C~
## # ... with 3,356 more rows
basketball %>%
ggplot(aes(weight,
col = position))

basketball %>%
ggplot(aes(weight,
col = position)) +
stat_density(geom = "line",
position = "identity")

basketball %>%
ggplot(aes(weight,
col = position)) +
stat_density(geom = "line",
position = "identity") +
labs(x = "weight (kg)",
y = "",
caption = "@Data from Kaggle",
title = "Weight of NBA Players by Position")

basketball %>%
ggplot(aes(weight,
col = position)) +
stat_density(geom = "line",
position = "identity") +
labs(x = "weight (kg)",
y = "",
caption = "@Data from Kaggle",
title = "Weight of NBA Players by Position") +
geom_rug()

basketball
## # A tibble: 3,366 x 8
## name year_start year_end position height weight birth_date college
## <chr> <dbl> <dbl> <chr> <dbl> <dbl> <chr> <chr>
## 1 Kareem ~ 1970 1989 C 218. 102. April 16, 1~ University ~
## 2 Mahmoud~ 1991 2001 G 185. 73.5 March 9, 19~ Louisiana S~
## 3 Tariq A~ 1998 2003 F 198. 101. November 3,~ San Jose St~
## 4 Shareef~ 1997 2008 F 206. 102. December 11~ University ~
## 5 Tom Abe~ 1977 1981 F 201. 99.8 May 6, 1954 Indiana Uni~
## 6 Forest ~ 1957 1957 G 190. 81.6 July 27, 19~ Western Ken~
## 7 John Ab~ 1947 1948 F 190. 88.5 February 9,~ Salem Inter~
## 8 Alex Ac~ 2006 2009 G 196. 83.9 January 21,~ Pepperdine ~
## 9 Don Ack~ 1954 1954 G 183. 83.0 September 4~ Long Island~
## 10 Bud Act~ 1968 1968 F 198. 95.3 January 11,~ Hillsdale C~
## # ... with 3,356 more rows
basketball %>%
ggplot(aes(x = position,
y = weight,
colour = position))

basketball %>%
ggplot(aes(x = position,
y = weight,
colour = position)) +
geom_boxplot(show.legend = F)

basketball %>%
ggplot(aes(x = position,
y = weight,
colour = position)) +
geom_boxplot(show.legend = F) +
labs(y = "weight (kg)",
x = "position",
caption = "@Data from Kaggle",
title = "Weight of NBA Players by Position")

basketball %>%
ggplot(aes(x = position,
y = weight,
colour = position)) +
geom_boxplot(show.legend = F) +
labs(y = "weight (kg)",
x = "position",
caption = "@Data from Kaggle",
title = "Weight of NBA Players by Position") +
geom_jitter(size = 0.4,
alpha = 0.2,
show.legend = F)

count: false

basketball
## # A tibble: 3,366 x 8
## name year_start year_end position height weight birth_date college
## <chr> <dbl> <dbl> <chr> <dbl> <dbl> <chr> <chr>
## 1 Kareem ~ 1970 1989 C 218. 102. April 16, 1~ University ~
## 2 Mahmoud~ 1991 2001 G 185. 73.5 March 9, 19~ Louisiana S~
## 3 Tariq A~ 1998 2003 F 198. 101. November 3,~ San Jose St~
## 4 Shareef~ 1997 2008 F 206. 102. December 11~ University ~
## 5 Tom Abe~ 1977 1981 F 201. 99.8 May 6, 1954 Indiana Uni~
## 6 Forest ~ 1957 1957 G 190. 81.6 July 27, 19~ Western Ken~
## 7 John Ab~ 1947 1948 F 190. 88.5 February 9,~ Salem Inter~
## 8 Alex Ac~ 2006 2009 G 196. 83.9 January 21,~ Pepperdine ~
## 9 Don Ack~ 1954 1954 G 183. 83.0 September 4~ Long Island~
## 10 Bud Act~ 1968 1968 F 198. 95.3 January 11,~ Hillsdale C~
## # ... with 3,356 more rows
basketball %>%
ggplot(aes(x = position,
y = weight,
colour = position))

basketball %>%
ggplot(aes(x = position,
y = weight,
colour = position)) +
geom_violin(show.legend = F)

basketball %>%
ggplot(aes(x = position,
y = weight,
colour = position)) +
geom_violin(show.legend = F) +
labs(x = "position",
y = "weight (kg)",
caption = "@Data from Kaggle",
title = "Weight of NBA Players by Position")

basketball %>%
ggplot(aes(x = position,
y = weight,
colour = position)) +
geom_violin(show.legend = F) +
labs(x = "position",
y = "weight (kg)",
caption = "@Data from Kaggle",
title = "Weight of NBA Players by Position") +
geom_jitter(size = 0.4,
alpha = 0.2,
show.legend = F)

gapminder::gapminder
## # A tibble: 1,704 x 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.8 8425333 779.
## 2 Afghanistan Asia 1957 30.3 9240934 821.
## 3 Afghanistan Asia 1962 32.0 10267083 853.
## 4 Afghanistan Asia 1967 34.0 11537966 836.
## 5 Afghanistan Asia 1972 36.1 13079460 740.
## 6 Afghanistan Asia 1977 38.4 14880372 786.
## 7 Afghanistan Asia 1982 39.9 12881816 978.
## 8 Afghanistan Asia 1987 40.8 13867957 852.
## 9 Afghanistan Asia 1992 41.7 16317921 649.
## 10 Afghanistan Asia 1997 41.8 22227415 635.
## # ... with 1,694 more rows
gapminder::gapminder %>%
ggplot(aes(x = lifeExp,
y = factor(year)))

gapminder::gapminder %>%
ggplot(aes(x = lifeExp,
y = factor(year))) +
geom_density_ridges(fill = "firebrick4",
colour = "firebrick4",
alpha = 0.4)

gapminder::gapminder %>%
ggplot(aes(x = lifeExp,
y = factor(year))) +
geom_density_ridges(fill = "firebrick4",
colour = "firebrick4",
alpha = 0.4) +
theme_ridges()

gapminder::gapminder %>%
ggplot(aes(x = lifeExp,
y = factor(year))) +
geom_density_ridges(fill = "firebrick4",
colour = "firebrick4",
alpha = 0.4) +
theme_ridges() +
labs(x = "Life Expectancy (years)",
y = "",
caption = "@Data Gapminder (WHO)")

Summary of Distributions

  • hugely important
  • great way to explore your data / introduce it to others
  • make sure you show you data when possible
    • use geom_rug()
    • use geom_jitter()
    • if lots of points, then use alpha to mute them

Visualising Relationships

  • scatter plots
    • encircling
    • jittering
    • using colour / size / shape
    • fitting lines
    • histograms and boxplots on the axes (and geom_rug())
  • line plots
  • correlation
stars
## star magnitude temp type
## 1 Sun 4.8 5840 G
## 2 SiriusA 1.4 9620 A
## 3 Canopus -3.1 7400 F
## 4 Arcturus -0.4 4590 K
## 5 AlphaCentauriA 4.3 5840 G
## 6 Vega 0.5 9900 A
## 7 Capella -0.6 5150 G
## 8 Rigel -7.2 12140 B
## 9 ProcyonA 2.6 6580 F
## 10 Betelgeuse -5.7 3200 M
## 11 Achemar -2.4 20500 B
## 12 Hadar -5.3 25500 B
## 13 Altair 2.2 8060 A
## 14 Aldebaran -0.8 4130 K
## 15 Spica -3.4 25500 B
## 16 Antares -5.2 3340 M
## 17 Fomalhaut 2.0 9060 A
## 18 Pollux 1.0 4900 K
## 19 Deneb -7.2 9340 A
## 20 BetaCrucis -4.7 28000 B
## 21 Regulus -0.8 13260 B
## 22 Acrux -4.0 28000 B
## 23 Adhara -5.2 23000 B
## 24 Shaula -3.4 25500 B
## 25 Bellatrix -4.3 23000 B
## 26 Castor 1.2 9620 A
## 27 Gacrux -0.5 3750 M
## 28 BetaCentauri -5.1 25500 B
## 29 AlphaCentauriB 5.8 4730 K
## 30 AlNa'ir -1.1 15550 B
## 31 Miaplacidus -0.6 9300 A
## 32 Elnath -1.6 12400 B
## 33 Alnilam -6.2 26950 B
## 34 Mirfak -4.6 7700 F
## 35 Alnitak -5.9 33600 O
## 36 Dubhe 0.2 4900 K
## 37 Alioth 0.4 9900 A
## 38 Peacock -2.3 20500 B
## 39 KausAustralis -0.3 11000 B
## 40 ThetaScorpii -5.6 7400 F
## 41 Atria -0.1 4590 K
## 42 Alkaid -1.7 20500 B
## 43 AlphaCrucisB -3.3 20500 B
## 44 Avior -2.1 4900 K
## 45 DeltaCanisMajoris -8.0 6100 F
## 46 Alhena 0.0 9900 A
## 47 Menkalinan 0.6 9340 A
## 48 Polaris -4.6 6100 F
## 49 Mirzam -4.8 25500 B
## 50 DeltaVulpeculae 0.6 9900 A
## 51 *ProximaCentauri 15.5 2670 M
## 52 *AlphaCentauriB 5.8 4900 K
## 53 Barnard'sStar 13.2 2800 M
## 54 Wolf359 16.7 2670 M
## 55 HD93735 10.5 3200 M
## 56 *L726-8 15.5 2670 M
## 57 *UVCeti 16.0 2670 M
## 58 *SiriusA 1.4 9620 A
## 59 *SiriusB 11.2 14800 DA
## 60 Ross154 13.1 2800 M
## 61 Ross248 14.8 2670 M
## 62 EpsilonEridani 6.1 4590 K
## 63 Ross128 13.5 2800 M
## 64 L789-6 14.5 2670 M
## 65 *GXAndromedae 10.4 3340 M
## 66 *GQAndromedae 13.4 2670 M
## 67 EpsilonIndi 7.0 4130 K
## 68 *61CygniA 7.6 4130 K
## 69 *61CygniB 8.4 3870 K
## 70 *Struve2398A 11.2 3070 M
## 71 *Struve2398B 11.9 2940 M
## 72 TauCeti 5.7 5150 G
## 73 *ProcyonA 2.6 6600 F
## 74 *ProcyonB 13.0 9700 DF
## 75 Lacaille9352 9.6 3340 M
## 76 G51-I5 17.0 2500 M
## 77 YZCeti 14.1 2670 M
## 78 BD+051668 11.9 2800 M
## 79 Lacaille8760 8.7 3340 K
## 80 KapteynsStar 10.9 3480 M
## 81 *Kruger60A 11.9 2940 M
## 82 *Kruger60B 13.3 2670 M
## 83 BD-124523 12.1 2940 M
## 84 Ross614A 13.1 2800 M
## 85 Wolf424A 15.0 2670 M
## 86 vanMaanen'sStar 14.2 13000 DB
## 87 TZArietis 14.0 2800 M
## 88 HD225213 10.3 3200 M
## 89 Altair 2.2 8060 A
## 90 ADLeonis 11.0 2940 M
## 91 *40EridaniA 6.0 4900 K
## 92 *40EridaniB 11.1 10000 DA
## 93 *40EridaniC 12.8 2940 M
## 94 *70OphiuchiA 5.8 4950 K
## 95 *70OphiuchiB 7.5 3870 K
## 96 EVLacertae 11.7 2800 M
stars %>%
ggplot(aes(temp,
magnitude,
col = type))

stars %>%
ggplot(aes(temp,
magnitude,
col = type)) +
geom_point(show.legend = F)

stars %>%
ggplot(aes(temp,
magnitude,
col = type)) +
geom_point(show.legend = F) +
geom_encircle(data = stars %>%
dplyr::filter(type == "B" | (type == "M" & magnitude > 9)),
show.legend = F)

stars %>%
ggplot(aes(temp,
magnitude,
col = type)) +
geom_point(show.legend = F) +
geom_encircle(data = stars %>%
dplyr::filter(type == "B" | (type == "M" & magnitude > 9)),
show.legend = F) +
scale_x_log10()

stars %>%
ggplot(aes(temp,
magnitude,
col = type)) +
geom_point(show.legend = F) +
geom_encircle(data = stars %>%
dplyr::filter(type == "B" | (type == "M" & magnitude > 9)),
show.legend = F) +
scale_x_log10() +
annotate("text",
x = c(15000, 5000),
y = c(-4, 14),
label = c("Type B Stars", "Faint Type M Stars"),
col = c("blue", "olivedrab3"),
family = "Ink Free",
size = 4,
fontface = 2)

stars %>%
ggplot(aes(temp,
magnitude,
col = type)) +
geom_point(show.legend = F) +
geom_encircle(data = stars %>%
dplyr::filter(type == "B" | (type == "M" & magnitude > 9)),
show.legend = F) +
scale_x_log10() +
annotate("text",
x = c(15000, 5000),
y = c(-4, 14),
label = c("Type B Stars", "Faint Type M Stars"),
col = c("blue", "olivedrab3"),
family = "Ink Free",
size = 4,
fontface = 2) +
scale_color_viridis_d()

scatter <- Galton %>%
ggplot(aes(parent, child)) +
geom_point()
jittered <- Galton %>% ggplot(aes(parent, child)) + geom_jitter(width = 0.4, height = 0.4)
scatter + plot_spacer() + jittered

Picturing Data Different Ways with ggplot

We're going to set out some of the options for looking at data

these depend on what kind of data you have

and what you want to investigate

Lots of these come from Top 50 Visualizations in R

Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
oTile View: Overview of Slides
Esc Back to slideshow