In this section you will learn the basics of working with numbers in R. This includes understanding

- How to manage the numeric type (integer vs. double)
- The different ways of generating non-random numbers
- The different ways of generating random numbers
- How to set seed values for reproducible random number generation
- The different ways to compare numbers
- How to round numeric values

The two most common numeric classes used in R are integer and double (for double precision floating point numbers). R automatically converts between these two classes when needed for mathematical purposes. As a result, it’s feasible to use R and perform analyses for years without specifying these differences.

By default, when you create a numeric vector using the `c()`

function it will produce a vector of double precision numeric values. To create a vector of integers using `c()`

you must specify explicity by placing an `L`

directly after each number.

```
# create a string of double-precision values
dbl_var <- c(1, 2.5, 4.5)
dbl_var
## [1] 1.0 2.5 4.5
# placing an L after the values creates a string of integers
int_var <- c(1L, 6L, 10L)
int_var
## [1] 1 6 10
```

To check whether a vector is made up of integer or double values:

```
# identifies the vector type (double, integer, logical, or character)
typeof(dbl_var)
## [1] "double"
typeof(int_var)
## [1] "integer"
```

By default, if you read in data that has no decimal points or you create numeric values using the `x <- 1:10`

method the numeric values will be coded as integer. If you want to change a double to an integer or vice versa you can specify one of the following:

```
# converts integers to double-precision values
as.double(int_var)
## [1] 1 6 10
# identical to as.double()
as.numeric(int_var)
## [1] 1 6 10
# converts doubles to integers
as.integer(dbl_var)
## [1] 1 2 4
```

There are a few R operators and functions that are especially useful for creating vectors of non-random numbers. These functions provide multiple ways for generating sequences of numbers.

To explicitly specify numbers in a sequence you can use the colon `:`

operator to specify all integers between two specified numbers or the combine `c()`

function to explicity specify all numbers in the sequence.

```
# create a vector of integers between 1 and 10
1:10
## [1] 1 2 3 4 5 6 7 8 9 10
# create a vector consisting of 1, 5, and 10
c(1, 5, 10)
## [1] 1 5 10
# save the vector of integers between 1 and 10 as object x
x <- 1:10
x
## [1] 1 2 3 4 5 6 7 8 9 10
```

A generalization of `:`

is the `seq()`

function, which generates a sequence of numbers with a specified arithmetic progression.

```
# generate a sequence of numbers from 1 to 21 by increments of 2
seq(from = 1, to = 21, by = 2)
## [1] 1 3 5 7 9 11 13 15 17 19 21
# generate a sequence of numbers from 1 to 21 that has 15 equal incremented
# numbers
seq(0, 21, length.out = 15)
## [1] 0.0 1.5 3.0 4.5 6.0 7.5 9.0 10.5 12.0 13.5 15.0 16.5 18.0 19.5
## [15] 21.0
```

The `rep()`

function allows us to conveniently repeat specified constants into long vectors. This function allows for collated and non-collated repetitions.

```
# replicates the values in x a specified number of times in a collated fashion
rep(1:4, times = 2)
## [1] 1 2 3 4 1 2 3 4
# replicates the values in x in an uncollated fashion
rep(1:4, each = 2)
## [1] 1 1 2 2 3 3 4 4
```

Simulation is a common practice in data analysis. Sometimes your analysis requires the implementation of a statistical procedure that requires random number generation or sampling (i.e. Monte Carlo simulation, bootstrap sampling, etc). R comes with a set of pseudo-random number generators that allow you to simulate the most common probability distributions such as:

To generate random numbers from a uniform distribution you can use the `runif()`

function. Alternatively, you can use `sample()`

to take a random sample using with or without replacements.

```
# generate n random numbers between the default values of 0 and 1
runif(n)
# generate n random numbers between 0 and 25
runif(n, min = 0, max = 25)
# generate n random numbers between 0 and 25 (with replacement)
sample(0:25, n, replace = TRUE)
# generate n random numbers between 0 and 25 (without replacement)
sample(0:25, n, replace = FALSE)
```

For example, to generate 25 random numbers between the values 0 and 10:

```
runif(25, min = 0, max = 10)
## [1] 9.2494720 1.0276421 9.6061007 7.4582455 8.3666868 0.8090925 7.5638221
## [8] 4.2810155 2.5850736 9.7962788 6.1705894 0.7037997 9.5056240 4.7589622
## [15] 7.9750129 5.3932881 5.1624935 1.2704098 8.7064680 8.6649293 0.1049461
## [22] 1.4827342 2.7337917 7.5236131 3.9803653
```

For each non-uniform probability distribution there are four primary functions available to generate random numbers, density (aka probability mass function), cumulative density, and quantiles. The prefixes for these functions are:

`r`

: random number generation`d`

: density or probability mass function`p`

: cumulative distribution`q`

: quantiles

The normal (or Gaussian) distribution is the most common and well know distribution. Within R, the normal distribution functions are written as

```
# generate n random numbers from a normal distribution with given mean & st. dev.
rnorm(n, mean = 0, sd = 1)
# generate CDF probabilities for value(s) in vector q
pnorm(q, mean = 0, sd = 1)
# generate quantile for probabilities in vector p
qnorm(p, mean = 0, sd = 1)
# generate density function probabilites for value(s) in vector x
dnorm(x, mean = 0, sd = 1)
```

For example, to generate 25 random numbers from a normal distribution with `mean = 100`

and
`standard deviation = 15`

:

```
x <- rnorm(25, mean = 100, sd = 15)
x
## [1] 107.84214 101.10742 73.67151 113.94035 108.47938 77.48445 73.02016
## [8] 81.02323 101.64169 112.67715 105.28478 92.35393 85.96284 108.83169
## [15] 88.71057 115.13657 141.69830 99.91198 118.69664 110.61667 83.20282
## [22] 113.91008 109.10879 93.45276 109.01996
summary(x)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 73.02 88.71 105.30 101.10 110.60 141.70
```

You can also pass a vector of values. For instance, say you want to know the CDF probabilities for each value in the vector `x`

created above:

```
pnorm(x, mean = 100, sd = 15)
## [1] 0.69944664 0.52942643 0.03960976 0.82364789 0.71406244 0.06667308
## [7] 0.03603657 0.10291447 0.54357552 0.80098468 0.63770038 0.30511760
## [13] 0.17468526 0.72199534 0.22583658 0.84353778 0.99728111 0.49765904
## [19] 0.89369904 0.76045844 0.13139693 0.82312464 0.72815841 0.33124331
## [25] 0.72619004
```

This is conventionally interpreted as the number of successes in `size = x`

trials and with `prob = p`

probability of success:

```
# generate a vector of length n displaying the number of successes from a trial
# size = 100 with a probabilty of success = 0.5
rbinom(n, size = 100, prob = 0.5)
# generate CDF probabilities for value(s) in vector q
pbinom(q, size = 100, prob = 0.5)
# generate quantile for probabilities in vector p
qbinom(p, size = 100, prob = 0.5)
# generate density function probabilites for value(s) in vector x
dbinom(x, size = 100, prob = 0.5)
```

The Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occuring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since the last event.

```
# generate a vector of length n displaying the random number of events occuring
# when lambda (mean rate) equals 4.
rpois(n, lambda = 4)
# generate CDF probabilities for value(s) in vector q when lambda (mean rate)
# equals 4.
ppois(q, lambda = 4)
# generate quantile for probabilities in vector p when lambda (mean rate)
# equals 4.
qpois(p, lambda = 4)
# generate density function probabilites for value(s) in vector x when lambda
# (mean rate) equals 4.
dpois(x, lambda = 4)
```

The Exponential probability distribution describes the time between events in a Poisson process.

```
# generate a vector of length n with rate = 1
rexp(n, rate = 1)
# generate CDF probabilities for value(s) in vector q when rate = 4.
pexp(q, rate = 1)
# generate quantile for probabilities in vector p when rate = 4.
qexp(p, rate = 1)
# generate density function probabilites for value(s) in vector x when rate = 4.
dexp(x, rate = 1)
```

The Gamma probability distribution is related to the Beta distribution and arises naturally in processes for which the waiting times between Poisson distributed events are relevant.

```
# generate a vector of length n with shape parameter = 1
rgamma(n, shape = 1)
# generate CDF probabilities for value(s) in vector q when shape parameter = 1.
pgamma(q, shape = 1)
# generate quantile for probabilities in vector p when shape parameter = 1.
qgamma(p, shape = 1)
# generate density function probabilites for value(s) in vector x when shape
# parameter = 1.
dgamma(x, shape = 1)
```

If you want to generate a sequence of random numbers and then be able to reproduce that same sequence of random numbers later you can set the random number seed generator with `set.seed()`

. This is a critical aspect of reproducible research.

For example, we can reproduce a random generation of 10 values from a normal distribution:

```
set.seed(197)
rnorm(n = 10, mean = 0, sd = 1)
## [1] 0.6091700 -1.4391423 2.0703326 0.7089004 0.6455311 0.7290563
## [7] -0.4658103 0.5971364 -0.5135480 -0.1866703
set.seed(197)
rnorm(n = 10, mean = 0, sd = 1)
## [1] 0.6091700 -1.4391423 2.0703326 0.7089004 0.6455311 0.7290563
## [7] -0.4658103 0.5971364 -0.5135480 -0.1866703
```

There are multiple ways to compare numeric values and vectors. This includes logical operators along with testing for exact equality and also near equality.

The normal binary operators allow you to compare numeric values and provides the answer in logical form:

```
x < y # is x less than y
x > y # is x greater than y
x <= y # is x less than or equal to y
x >= y # is x greater than or equal to y
x == y # is x equal to y
x != y # is x not equal to y
```

These operations can be used for single number comparison:

```
x <- 9
y <- 10
x == y
## [1] FALSE
```

and also for comparison of numbers within vectors:

```
x <- c(1, 4, 9, 12)
y <- c(4, 4, 9, 13)
x == y
## [1] FALSE TRUE TRUE FALSE
```

Note that logical values `TRUE`

and `FALSE`

equate to 1 and 0 respectively. So if you want to identify the number of equal values in two vectors you can wrap the operation in the `sum()`

function:

```
# How many pairwise equal values are in vectors x and y
sum(x == y)
## [1] 2
```

If you need to identify the location of pairwise equalities in two vectors you can wrap the operation in the `which()`

function:

```
# Where are the pairwise equal values located in vectors x and y
which(x == y)
## [1] 2 3
```

To test if two objects are exactly equal:

```
x <- c(4, 4, 9, 12)
y <- c(4, 4, 9, 13)
identical(x, y)
## [1] FALSE
```

```
x <- c(4, 4, 9, 12)
y <- c(4, 4, 9, 12)
identical(x, y)
## [1] TRUE
```

Sometimes you wish to test for ‘near equality’. The `all.equal()`

function allows you to test for equality with a difference tolerance of 1.5e-8.

```
x <- c(4.00000005, 4.00000008)
y <- c(4.00000002, 4.00000006)
all.equal(x, y)
## [1] TRUE
```

If the difference is greater than the tolerance level the function will return the mean relative difference:

```
x <- c(4.005, 4.0008)
y <- c(4.002, 4.0006)
all.equal(x, y)
## [1] "Mean relative difference: 0.0003997102"
```

There are many ways of rounding to the nearest integer, up, down, or toward a specified decimal place. Assuming we have the following vector `x`

:

```
x <- (1, 1.35, 1.7, 2.05, 2.4, 2.75, 3.1, 3.45, 3.8, 4.15, 4.5, 4.85, 5.2, 5.55, 5.9)
```

The following illustrates the common ways to round `x`

:

```
# Round to the nearest integer
round(x)
## [1] 1 1 2 2 2 3 3 3 4 4 4 5 5 6 6
# Round up
ceiling(x)
## [1] 1 2 2 3 3 3 4 4 4 5 5 5 6 6 6
# Round down
floor(x)
## [1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5
# Round to a specified decimal
round(x, digits = 1)
## [1] 1.0 1.4 1.7 2.0 2.4 2.8 3.1 3.4 3.8 4.2 4.5 4.8 5.2 5.5 5.9
```