7.2 Joins and Relates Toolset
For this chapter, you will need the following R Packages:
library(dplyr)
In ArcGIS, joining two datasets on a common attribute is done with tools from the “Join” Toolset. Similar as described with the Selection tools in chapter 5, a join has a temporary / floating nature and does not automatically produce an output. In R
, joining two datasets is only persistent if the output is assigned to a new variable7.
In R we have two main functions for joining two datasets. On the one hand there is the base R function merge
, on the other hand there is the *_join
family that lie within the dplyr
package. Since the latter family of functions are very close to how Joins are done in SQL, we will use the latter case for our examples below.
Before we begin with our examples, we have to make clear the differences among the various forms of join operations.
- Inner Join only retains rows that are matched in both datasets.
- Outer Join retains all rows from both datasets.
- Left Join retains all rows from the “left” dataset, Right Join only those from the “right” dataset.
::include_graphics("images/joins.png") knitr
7.2.1 Inner join
Inner Join
in R is the most common type of join. It is an operation that returns the rows when the matching condition is fulfilled. Below we demonstrate it with an example.
<- data.frame(TeamID = c(1, 4, 6, 11), TeamName = c("new york knicks", "los angeles lakers",
df1 "milwaukee bucks", "boston celtics"), Championships = c(2, 17, 1, 17))
<- data.frame(TeamID = c(1, 2, 11, 8), TeamName = c("new york knicks", "philadelphia 76ers",
df2 "boston celtics", "los angeles clippers"), Championships = c(2, 3, 17, 0))
df1
## TeamID TeamName Championships
## 1 1 new york knicks 2
## 2 4 los angeles lakers 17
## 3 6 milwaukee bucks 1
## 4 11 boston celtics 17
df2
## TeamID TeamName Championships
## 1 1 new york knicks 2
## 2 2 philadelphia 76ers 3
## 3 11 boston celtics 17
## 4 8 los angeles clippers 0
%>% inner_join(df2) df1
## TeamID TeamName Championships
## 1 1 new york knicks 2
## 2 11 boston celtics 17
7.2.2 Outer join
Outer join
in R using simply returns all rows from both data frames. This is very well depicted in figure 7.1.
full_join(df1, df2)
## TeamID TeamName Championships
## 1 1 new york knicks 2
## 2 4 los angeles lakers 17
## 3 6 milwaukee bucks 1
## 4 11 boston celtics 17
## 5 2 philadelphia 76ers 3
## 6 8 los angeles clippers 0
7.2.3 Left / Right join
The left join
in R returns all records from the data frame on the left, as well as and the matched records from the one at the right.
left_join(df1, df2)
## TeamID TeamName Championships
## 1 1 new york knicks 2
## 2 4 los angeles lakers 17
## 3 6 milwaukee bucks 1
## 4 11 boston celtics 17
Similarly works also the right join
.
right_join(df1, df2)
## TeamID TeamName Championships
## 1 1 new york knicks 2
## 2 11 boston celtics 17
## 3 2 philadelphia 76ers 3
## 4 8 los angeles clippers 0
or piped into a new function↩︎