As with vectors, we can create and manipulate matrices in R as two-dimensional rectangular arrays of numbers. As data sets are typically in this form, it is worth becoming familiar with how matrices work in R (although most of the time we will use data frames instead).
A matrix is created using the matrix
function, which takes three arguments: the values of the matrix, the number of rows (nrow
), and the number of columns (ncol
):
M <- matrix(c(3:14), nrow = 4, ncol=3)
M
## [,1] [,2] [,3]
## [1,] 3 7 11
## [2,] 4 8 12
## [3,] 5 9 13
## [4,] 6 10 14
By default, R uses the given values to first fill down the first column, and then the second, and so on. If the values are arranged in row order, we can add the byrow=TRUE
optional argument
N <- matrix(c(3:14), nrow = 4, ncol=3, byrow=TRUE)
N
## [,1] [,2] [,3]
## [1,] 3 4 5
## [2,] 6 7 8
## [3,] 9 10 11
## [4,] 12 13 14
R Help: matrix,
Alternatively, we can construct a matrix by ‘binding’ a number of vectors of equal length together. If we treat each vector as a column then we use the cbind
function, and if each vector is a row then we use rbind
:
cbind(1:4, 5:8, -2:-5)
## [,1] [,2] [,3]
## [1,] 1 5 -2
## [2,] 2 6 -3
## [3,] 3 7 -4
## [4,] 4 8 -5
rbind(1:4, 5:8, -2:-5)
## [,1] [,2] [,3] [,4]
## [1,] 1 2 3 4
## [2,] 5 6 7 8
## [3,] -2 -3 -4 -5
R Help: cbind, rbind
We can find the dimensions of a particular matrix by passing it to the dim
function:
dim(M)
## [1] 4 3
R Help: dim
We can also assign labels to the columns or rows of a matrix via the colnames
and rownames
functions, assigning the labels as a vector of strings:
colnames(N) <- c('A','B','C')
N
## A B C
## [1,] 3 4 5
## [2,] 6 7 8
## [3,] 9 10 11
## [4,] 12 13 14
We access particular elements of a matrix in a similar way to vectors, however we must now specify the rows and columns of the elements of interest separated by a comma ,
:
M
## [,1] [,2] [,3]
## [1,] 3 7 11
## [2,] 4 8 12
## [3,] 5 9 13
## [4,] 6 10 14
M[3,2] ## get the element in the 3rd row, 2nd column
## [1] 9
If we want to extract all the rows in a particular column (or vice versa), then we omit the row number:
M[1,] ## get the entire first row
## [1] 3 7 11
M[,2:3] ## get the second and third columns
## [,1] [,2]
## [1,] 7 11
## [2,] 8 12
## [3,] 9 13
## [4,] 10 14
We can also use logical comparisons to select rows (or columns) depending on their values
M[M[,1]<=5,] ## take all rows where the entry in the first column is <= 5
## [,1] [,2] [,3]
## [1,] 3 7 11
## [2,] 4 8 12
## [3,] 5 9 13
Matrices behave in the same way as vectors when it comes to basic arithmetic with constants and other matrices:
A <- matrix(c(3, 9, -1, 4),nrow=2)
A
## [,1] [,2]
## [1,] 3 -1
## [2,] 9 4
B <- matrix(c(5, 2, 0, 9), nrow=2)
B
## [,1] [,2]
## [1,] 5 0
## [2,] 2 9
A*2
## [,1] [,2]
## [1,] 6 -2
## [2,] 18 8
B+1
## [,1] [,2]
## [1,] 6 1
## [2,] 3 10
A-B
## [,1] [,2]
## [1,] -2 -1
## [2,] 7 -5
However, multiplication of two matrices using *
does not perform multiplication: it performs an arithmetic multiplication of each of the elements
A*B
## [,1] [,2]
## [1,] 15 0
## [2,] 18 36
To perform matrix multiplication of two matrices and not element-wise multiplication, we must use the special operator %*%
:
A%*%B
## [,1] [,2]
## [1,] 13 -9
## [2,] 53 36
Note the difference to A*B
computed above.
R also provides a range of functions to support linear algebra with matrices:
t
finds the transpose of a matrixsolve
finds the inverse of a matrixeigen
finds the eigen-decomposition of a matrixdiag
, upper.tri
, lower.tri
provide functions to produce diagonal, upper triangular and lower triangular matricesWhile matrices are useful, they require all elements of the matrix to be of the same type. While appropriate for linear algebra, this is less appropriate when considering a data of data where we may have numerical (continuous) data for some variables, and discrete data for others. To deal with this, we shall use a data frame which is a two-dimensional table of data (like a matrix) where each column contains values of one variable, and each row contains one set of values from each observation. Each column is labelled with a variable name, and the values within each column must be of the same type but the types of data held in each column can differ according to the type of variable it represents.
We will rarely need to construct a data frame by hand as we willy usually load our data from a package using data
. However, we can construct a data frame using the data.frame
function. The arguments to data frame are vectors of data for each column, and the names of the arguments are the names of the variables. Thus a small data frame containing data on the gender, height and weight of 3 people would be created by:
meas <- data.frame(gender = c("M", "M","F"),
height = c(172, 186.5, 165),
weight = c(91, 99, 74))
meas
## gender height weight
## 1 M 172.0 91
## 2 M 186.5 99
## 3 F 165.0 74
R Help: data.frame,
Like a matrix, we can access specific elements using []
:
meas[1,2]
## [1] 172
meas[2,]
## gender height weight
## 2 M 186.5 99
However, we can also extract individual columns by using the variable name and the dollar-sign $
(this is usually much easier than trying to remember which number is the column you were interested in):
meas$weight
## [1] 91 99 74
meas[meas$weight>80,]
## gender height weight
## 1 M 172.0 91
## 2 M 186.5 99
meas$gender
## [1] "M" "M" "F"
Notice how R has spotted that gender
is a discrete categorical variable and it has identified the two values F
and M
.
We can add additional columns to a data frame by assigning them to a new column using a combination of $
and <-
:
meas$age <- c(28, 55, 43)
meas
## gender height weight age
## 1 M 172.0 91 28
## 2 M 186.5 99 55
## 3 F 165.0 74 43
Some functions, such as summary
, can be applied to entire data frames:
summary(meas)
## gender height weight age
## Length:3 Min. :165.0 Min. :74.0 Min. :28.0
## Class :character 1st Qu.:168.5 1st Qu.:82.5 1st Qu.:35.5
## Mode :character Median :172.0 Median :91.0 Median :43.0
## Mean :174.5 Mean :88.0 Mean :42.0
## 3rd Qu.:179.2 3rd Qu.:95.0 3rd Qu.:49.0
## Max. :186.5 Max. :99.0 Max. :55.0