7 min read

ggplot2: exploration of the group aesthetics

I have made several plots with ggplot2 in the past 2 years and occasionally got errors related to the group aesthetics. I solved these issues without once taking the time to fully understand how the group aesthetic works. This blogpost is a result of my experiments to finally explore how it works. My understanding is a combination of my experiments and Hadley Wickhams outstanding book about ggplot2. (https://github.com/hadley/ggplot2-book)

Scenario 1: mapping based on one variable

Our dummy data will be a unit square. We labeled its points as common in maths: counter-clockwise.

dt <- data.table(
    x = c(0, 0, 1, 1),
    y = c(0, 1, 0, 1),
    grp = c('red', 'blue', 'blue', 'red'),
    id = c('A', 'D', 'B', 'C')
)

Case 1: we have a global mapping applicable to lines

dt %>% ggplot(aes(x, y, col = I(grp))) +
    geom_point(size = 3) +
    geom_label(aes(label = id), vjust = "inward", hjust = "inward")

We may want to join the points with segments. By default, the line geom inherits the color argument from the call to ggplot. The group aesthetic is a combination of all discrete mappings, in this case the default group for the line geom will be the same as for the colors.

dt %>% ggplot(aes(x, y, col = I(grp))) +
    geom_point(size = 3) +
    geom_line() +
    geom_label(aes(label = id), vjust = "inward", hjust = "inward")

We may want to link the points with segments joining together. One option is to overwrite the color mapping for this layer.

dt %>% ggplot(aes(x, y, col = I(grp))) +
    geom_point(size = 3) +
    geom_line(aes(col = NULL)) +
    geom_label(aes(label = id), vjust = "inward", hjust = "inward")

Another option is to overwrite the silently set group aesthetic and setting it constant. This way the segments will inherit their color from their (left) neighbour point.

dt %>% ggplot(aes(x, y, col = I(grp))) +
    geom_point(size = 3) +
    geom_line(aes(group = 'arbitrary_constant_value')) +
    geom_label(aes(label = id), vjust = "inward", hjust = "inward")

What order are the points linked? Now left-to-right, and order of appearance in data is coincidentally the same order as the points appear in the dataset, so we have to make further experiments.

dt_mixed <- data.table(
    x = c(1, 0, 1, 0),
    y = c(1, 0, 0, 1),
    grp = c('red', 'red', 'blue', 'blue'),
    id = c('A', 'C', 'B', 'D')
)
dt_mixed %>% ggplot(aes(x, y, col = I(grp))) +
    geom_point(size = 3) +
    geom_line(aes(group = 'arbitrary_constant_value')) +
    geom_label(aes(label = id), vjust = "inward", hjust = "inward")

We can safely conclude that the order in which the points are linked is left-to-right, order of appearance.

Case 2: we have a global mapping not applicable to lines

A slightly different case when the aesthetic defined in the ggplot call is not applicable to lines. It will still effect the silently set group aesthetics.

dt %>% ggplot(aes(x, y, pch = grp)) +
    geom_point(size = 3) +
    geom_line() +
    geom_label(aes(label = id), vjust = "inward", hjust = "inward")

We can still overwrite the pch aesthetics in the geom_line call thus silently unsetting the group variable.

dt %>% ggplot(aes(x, y, pch = grp)) +
    geom_point(size = 3) +
    geom_line(aes(pch = NULL)) +
    geom_label(aes(label = id), vjust = "inward", hjust = "inward")

However in this case explicitly setting the group aesthetics is inarguably more clear. We even got a warning saying “Ignoring unknown aesthetics: shape”.

dt %>% ggplot(aes(x, y, pch = grp)) +
    geom_point(size = 3) +
    geom_line(aes(group = 1)) +
    geom_label(aes(label = id), vjust = "inward", hjust = "inward")

What happens when the group variable is a combination of more mappings?

Scenario 2: mapping based on two variables

Case 1: one discrete and one continuous variable

To join the points in a group with line segments we need at least two points so we need to define a slightly bigger dataset to experiment with.

dt <- data.table(
    x = c(0,2,4,5,4,2,0,-1),
    y = c(0,-1,0,2,4,5,4,2),
    grp_1 = c('red','blue','red','blue','red','blue','red','blue'),
    grp_2 = c(2,2,4,4,2,2,4,4),
    id = LETTERS[1:8]
)
dt %>% ggplot(aes(x, y, col = I(grp_1), size = I(grp_2))) +
    geom_point() +
    geom_label(aes(label = id), vjust = "inward", hjust = "inward")

By default we will have 2 group of 4 points linked together: one group for each value of the discrete variable used in the aesthetics call in ggplot.

dt %>% ggplot(aes(x, y, col = I(grp_1), size = I(grp_2))) +
    geom_point() +
    geom_line() +
    geom_label(aes(label = id), vjust = "inward", hjust = "inward")

One option is to overwrite the color and size mapping in geom_line:

dt %>% ggplot(aes(x, y, col = I(grp_1), size = I(grp_2))) +
    geom_point() +
    geom_line(aes(col = NULL, size = NULL)) +
    geom_label(aes(label = id), vjust = "inward", hjust = "inward")

Another option is to set the group aesthetics to a constant:

dt %>% ggplot(aes(x, y, col = I(grp_1), size = I(grp_2))) +
    geom_point() +
    geom_line(aes(group = 1)) +
    geom_label(aes(label = id), vjust = "inward", hjust = "inward")

Case 2: two discrete variables

dt <- data.table(
    x = c(0,2,4,5,4,2,0,-1),
    y = c(0,-1,0,2,4,5,4,2),
    grp_1 = c('red','blue','red','blue','red','blue','red','blue'),
    grp_2 = c('a', 'a', 'b', 'b', 'a', 'a', 'b', 'b'),
    id = LETTERS[1:8]
)

By default we have 4 pair of pairwise linked points: one for each combination of the two discrete variables used in the aesthetics call of ggplot.

dt %>% ggplot(aes(x, y, col = I(grp_1), pch = grp_2)) +
    geom_point(size = 3) +
    geom_line() +
    geom_label(aes(label = id), vjust = "inward", hjust = "inward")

We can overwride one or two of these in the aes call in geom_line:

dt %>% ggplot(aes(x, y, col = I(grp_1), pch = grp_2)) +
    geom_point(size = 3) +
    geom_line(aes(pch = NULL)) +
    geom_label(aes(label = id), vjust = "inward", hjust = "inward")

dt %>% ggplot(aes(x, y, col = I(grp_1), pch = grp_2)) +
    geom_point(size = 3) +
    geom_line(aes(col = NULL, pch = NULL)) +
    geom_label(aes(label = id), vjust = "inward", hjust = "inward")

We can also set the group aesthetics to constant or combine these two approaches.

dt %>% ggplot(aes(x, y, col = I(grp_1), pch = grp_2)) +
    geom_point(size = 3) +
    geom_line(aes(group = 1)) +
    geom_label(aes(label = id), vjust = "inward", hjust = "inward")

dt %>% ggplot(aes(x, y, col = I(grp_1), pch = grp_2)) +
    geom_point(size = 3) +
    geom_line(aes(group = 1, pch = NULL)) +
    geom_label(aes(label = id), vjust = "inward", hjust = "inward")

What happens if we specify the group variable outside aes?

dt %>% ggplot(aes(x, y, col = I(grp_1), pch = grp_2)) +
    geom_point(size = 3) +
    geom_line(group = 1) +
    geom_label(aes(label = id), vjust = "inward", hjust = "inward")

The order in which the segments are linked has been changed. But to what? Let’s find out by targeted experiments.

dt %>% ggplot(aes(x, y, col = I(grp_1), pch = grp_2)) +
    geom_point(size = 3) +
    geom_line(group = 1, aes(col = NULL)) +
    geom_label(aes(label = id), vjust = "inward", hjust = "inward")

dt %>% ggplot(aes(x, y, col = I(grp_1), pch = grp_2)) +
    geom_point(size = 3) +
    geom_line(group = 1, aes(pch = NULL)) +
    geom_label(aes(label = id), vjust = "inward", hjust = "inward")

dt %>% ggplot(aes(x, y, col = I(grp_1), pch = grp_2)) +
    geom_point(size = 3) +
    geom_line(group = 1, aes(col = NULL, pch = NULL)) +
    geom_label(aes(label = id), vjust = "inward", hjust = "inward")

This way in the background we still have the group in aesthetics: it has effect although we overwrote the effect of joining lines, we did not overwrote every effect of the aesthetics mapping. So the order in which the points are linked: first levels of the group, then left to right, then order of appearance in dataset. Check for yourself with the above examples!

e.g. blue < red, circle < triangle.

Conclusion

Group is the interaction of discrete variables set inside aes unless explicitly overwritten.

The order in which the points are linked:

  1. order of group levels
  2. left-to-right
  3. order of appearance in data