The basic structure in R is the vector. A vector is a sequence of data elements of the same basic type: numeric, character, logical, factors, or dates (there are two additional vector types which I will not discuss - complex and raw). This tutorial provides you with the basics of managing vectors.
There are four main ways to create a vector: :, c(), seq(), rep()
. The colon :
operator can be used to create a vector of integers between two specified numbers or the c()
function can be used to create vectors of objects by concatenating elements together:
# integer vector
w <- 8:17
w
## [1] 8 9 10 11 12 13 14 15 16 17
# double precision floating point (number with decimals) vector
x <- c(0.5, 0.6, 0.2)
x
## [1] 0.5 0.6 0.2
# logical vector
y <- c(TRUE, FALSE, FALSE)
y
## [1] TRUE FALSE FALSE
# Character vector
z <- c("a", "b", "c")
z
## [1] "a" "b" "c"
The seq()
function can be used to generate a vector of a specified sequence of numbers (or dates) with a specified arithmetic progression. And the rep()
function allows us to conveniently repeat specified constants into long vectors in a collated or non-collated manner.
# generate a sequence of numbers from 1 to 21 by increments of 2
seq(from = 1, to = 21, by = 2)
## [1] 1 3 5 7 9 11 13 15 17 19 21
# generate a sequence of numbers from 1 to 21 that has 15 equal incremented
# numbers
seq(0, 21, length.out = 15)
## [1] 0.0 1.5 3.0 4.5 6.0 7.5 9.0 10.5 12.0 13.5 15.0 16.5 18.0 19.5
## [15] 21.0
# replicates the values in x a specified number of times in a collated fashion
rep(1:4, times = 2)
## [1] 1 2 3 4 1 2 3 4
# replicates the values in x in an uncollated fashion
rep(1:4, each = 2)
## [1] 1 1 2 2 3 3 4 4
All elements of a vector must be the same type, so when you attempt to combine different types of elements (i.e. character and numeric) they will be coerced to the most flexible type possible:
# numerics are turned to characters
str(c("a", "b", "c", 1, 2, 3))
## chr [1:6] "a" "b" "c" "1" "2" "3"
# logical are turned to numerics...
str(c(1, 2, 3, TRUE, FALSE))
## num [1:5] 1 2 3 1 0
# or character
str(c("A", "B", "C", TRUE, FALSE))
## chr [1:5] "A" "B" "C" "TRUE" "FALSE"
Coercion often happens automatically. Most mathematical functions (+
, log
, abs
, etc.) will coerce to a double or integer, and most logical operations (&
, |
, any
, etc) will coerce to a logical. You will usually get a warning message if the coercion might lose information. Often it is best to explicitly coerce with as.character()
, as.double()
, as.integer()
, or as.logical()
. For example, the built-in vector state.region
is a factor vector. We can coerce this to a character vector with as.character
.
class(state.region)
## [1] "factor"
state.region2 <- as.character(state.region)
class(state.region2)
## [1] "character"
To add additional elements to a pre-existing vector we can continue to leverage the c()
function. Also, note that vectors are always flat so nested c()
functions will not add additional dimensions to the vector:
v1 <- 8:17
c(v1, 18:22)
## [1] 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
# same as
c(v1, c(18, c(19, c(20, c(21:22)))))
## [1] 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
The attributes that you can add to vectors includes names and comments. If we continue with our vector v1
we can see that the vector currently has no attributes:
attributes(v1)
## NULL
We can add names to vectors using two approaches. The first uses names()
to assign names to each element of the vector. The second approach is to assign names when creating the vector.
# assigning names to a pre-existing vector
names(v1) <- letters[1:length(v1)]
v1
## a b c d e f g h i j
## 8 9 10 11 12 13 14 15 16 17
attributes(v1)
## $names
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
# adding names when creating vectors
v2 <- c(name1 = 1, name2 = 2, name3 = 3)
v2
## name1 name2 name3
## 1 2 3
attributes(v2)
## $names
## [1] "name1" "name2" "name3"
We can also add comments to vectors to act as a note to the user. This does not change how the vector behaves; rather, it simply acts as a form of metadata for the vector.
comment(v1) <- "This is a comment on a vector"
v1
## a b c d e f g h i j
## 8 9 10 11 12 13 14 15 16 17
attributes(v1)
## $names
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
##
## $comment
## [1] "This is a comment on a vector"
The four main ways to subset a vector include combining square brackets [ ] with:
You can also subset with double brackets [[ ]]
for simplifying subsets.
Subsetting with positive integers returns the elements at the specified positions:
v1
## a b c d e f g h i j
## 8 9 10 11 12 13 14 15 16 17
v1[2]
## b
## 9
v1[2:4]
## b c d
## 9 10 11
v1[c(2, 4, 6, 8)]
## b d f h
## 9 11 13 15
# note that you can duplicate index positions
v1[c(2, 2, 4)]
## b b d
## 9 9 11
Subsetting with negative integers will omit the elements at the specified positions:
v1[-1]
## b c d e f g h i j
## 9 10 11 12 13 14 15 16 17
v1[-c(2, 4, 6, 8)]
## a c e g i j
## 8 10 12 14 16 17
Subsetting with logical values will select the elements where the corresponding logical value is TRUE
:
v1[c(TRUE, FALSE, TRUE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE, TRUE)]
## a c e f g j
## 8 10 12 13 14 17
v1[v1 < 12]
## a b c d
## 8 9 10 11
v1[v1 < 12 | v1 > 15]
## a b c d i j
## 8 9 10 11 16 17
# if logical vector is shorter than the length of the vector being
# subsetted, it will be recycled to be the same length
v1[c(TRUE, FALSE)]
## a c e g i
## 8 10 12 14 16
Subsetting with names will return the elements with the matching names specified:
v1["b"]
## b
## 9
v1[c("a", "c", "h")]
## a c h
## 8 10 15
Its also important to understand the difference between simplifying and preserving when subsetting. Simplifying subsets returns the simplest possible data structure that can represent the output. Preserving subsets keeps the structure of the output the same as the input.1
For vectors, subsetting with single brackets [ ]
preserves while subsetting with double brackets [[ ]]
simplifies. The change you will notice when simplifying vectors is the removal of names.
v1[1]
## a
## 8
v1[[1]]
## [1] 8
state.name
.paste0("v", 1:50)
)?state.name
for those elements with the following names: V35, V17, V14, V38.See Hadley Wickham’s section on Simplifying vs. Preserving Subsetting to learn more. ↩