7.2 Joins and Relates Toolset

For this chapter, you will need the following R Packages:

library(dplyr)

In ArcGIS, joining two datasets on a common attribute is done with tools from the “Join” Toolset. Similar as described with the Selection tools in chapter 5, a join has a temporary / floating nature and does not automatically produce an output. In R, joining two datasets is only persistent if the output is assigned to a new variable7.

In R we have two main functions for joining two datasets. On the one hand there is the base R function merge, on the other hand there is the *_join family that lie within the dplyr package. Since the latter family of functions are very close to how Joins are done in SQL, we will use the latter case for our examples below.

Before we begin with our examples, we have to make clear the differences among the various forms of join operations.

  • Inner Join only retains rows that are matched in both datasets.
  • Outer Join retains all rows from both datasets.
  • Left Join retains all rows from the “left” dataset, Right Join only those from the “right” dataset.
knitr::include_graphics("images/joins.png")
Join operations

Figure 7.1: Join operations

7.2.1 Inner join

Inner Join in R is the most common type of join. It is an operation that returns the rows when the matching condition is fulfilled. Below we demonstrate it with an example.

df1 <- data.frame(TeamID = c(1, 4, 6, 11), TeamName = c("new york knicks", "los angeles lakers", 
    "milwaukee bucks", "boston celtics"), Championships = c(2, 17, 1, 17))

df2 <- data.frame(TeamID = c(1, 2, 11, 8), TeamName = c("new york knicks", "philadelphia 76ers", 
    "boston celtics", "los angeles clippers"), Championships = c(2, 3, 17, 0))

df1
##   TeamID           TeamName Championships
## 1      1    new york knicks             2
## 2      4 los angeles lakers            17
## 3      6    milwaukee bucks             1
## 4     11     boston celtics            17
df2
##   TeamID             TeamName Championships
## 1      1      new york knicks             2
## 2      2   philadelphia 76ers             3
## 3     11       boston celtics            17
## 4      8 los angeles clippers             0
df1 %>% inner_join(df2)
##   TeamID        TeamName Championships
## 1      1 new york knicks             2
## 2     11  boston celtics            17

7.2.2 Outer join

Outer join in R using simply returns all rows from both data frames. This is very well depicted in figure 7.1.

full_join(df1, df2)
##   TeamID             TeamName Championships
## 1      1      new york knicks             2
## 2      4   los angeles lakers            17
## 3      6      milwaukee bucks             1
## 4     11       boston celtics            17
## 5      2   philadelphia 76ers             3
## 6      8 los angeles clippers             0

7.2.3 Left / Right join

The left join in R returns all records from the data frame on the left, as well as and the matched records from the one at the right.

left_join(df1, df2)
##   TeamID           TeamName Championships
## 1      1    new york knicks             2
## 2      4 los angeles lakers            17
## 3      6    milwaukee bucks             1
## 4     11     boston celtics            17

Similarly works also the right join.

right_join(df1, df2)
##   TeamID             TeamName Championships
## 1      1      new york knicks             2
## 2     11       boston celtics            17
## 3      2   philadelphia 76ers             3
## 4      8 los angeles clippers             0

  1. or piped into a new function↩︎