1 Matrices

As with vectors, we can create and manipulate matrices in R as two-dimensional rectangular arrays of numbers. As data sets are typically in this form, it is worth becoming familiar with how matrices work in R (although most of the time we will use data frames instead).

A matrix is created using the matrix function, which takes three arguments: the values of the matrix, the number of rows (nrow), and the number of columns (ncol):

M <- matrix(c(3:14), nrow = 4, ncol=3)
M
##      [,1] [,2] [,3]
## [1,]    3    7   11
## [2,]    4    8   12
## [3,]    5    9   13
## [4,]    6   10   14

By default, R uses the given values to first fill down the first column, and then the second, and so on. If the values are arranged in row order, we can add the byrow=TRUE optional argument

N <- matrix(c(3:14), nrow = 4, ncol=3, byrow=TRUE)
N
##      [,1] [,2] [,3]
## [1,]    3    4    5
## [2,]    6    7    8
## [3,]    9   10   11
## [4,]   12   13   14

R Help: matrix,

Alternatively, we can construct a matrix by ‘binding’ a number of vectors of equal length together. If we treat each vector as a column then we use the cbind function, and if each vector is a row then we use rbind:

cbind(1:4, 5:8, -2:-5)
##      [,1] [,2] [,3]
## [1,]    1    5   -2
## [2,]    2    6   -3
## [3,]    3    7   -4
## [4,]    4    8   -5
rbind(1:4, 5:8, -2:-5)
##      [,1] [,2] [,3] [,4]
## [1,]    1    2    3    4
## [2,]    5    6    7    8
## [3,]   -2   -3   -4   -5

R Help: cbind, rbind

We can find the dimensions of a particular matrix by passing it to the dim function:

dim(M)
## [1] 4 3

R Help: dim

We can also assign labels to the columns or rows of a matrix via the colnames and rownames functions, assigning the labels as a vector of strings:

colnames(N) <- c('A','B','C')
N
##       A  B  C
## [1,]  3  4  5
## [2,]  6  7  8
## [3,]  9 10 11
## [4,] 12 13 14

1.1 Extracting parts of a matrix

We access particular elements of a matrix in a similar way to vectors, however we must now specify the rows and columns of the elements of interest separated by a comma ,:

M
##      [,1] [,2] [,3]
## [1,]    3    7   11
## [2,]    4    8   12
## [3,]    5    9   13
## [4,]    6   10   14
M[3,2] ## get the element in the 3rd row, 2nd column
## [1] 9

If we want to extract all the rows in a particular column (or vice versa), then we omit the row number:

M[1,] ## get the entire first row
## [1]  3  7 11
M[,2:3] ## get the second and third columns
##      [,1] [,2]
## [1,]    7   11
## [2,]    8   12
## [3,]    9   13
## [4,]   10   14

We can also use logical comparisons to select rows (or columns) depending on their values

M[M[,1]<=5,] ## take all rows where the entry in the first column is <= 5
##      [,1] [,2] [,3]
## [1,]    3    7   11
## [2,]    4    8   12
## [3,]    5    9   13

1.2 Simple matrix computations

Matrices behave in the same way as vectors when it comes to basic arithmetic with constants and other matrices:

A <- matrix(c(3, 9, -1, 4),nrow=2)
A
##      [,1] [,2]
## [1,]    3   -1
## [2,]    9    4
B <- matrix(c(5, 2, 0, 9), nrow=2)
B
##      [,1] [,2]
## [1,]    5    0
## [2,]    2    9
A*2
##      [,1] [,2]
## [1,]    6   -2
## [2,]   18    8
B+1
##      [,1] [,2]
## [1,]    6    1
## [2,]    3   10
A-B
##      [,1] [,2]
## [1,]   -2   -1
## [2,]    7   -5

However, multiplication of two matrices using * does not perform multiplication: it performs an arithmetic multiplication of each of the elements

A*B
##      [,1] [,2]
## [1,]   15    0
## [2,]   18   36

1.3 A little linear algebra

To perform matrix multiplication of two matrices and not element-wise multiplication, we must use the special operator %*%:

A%*%B
##      [,1] [,2]
## [1,]   13   -9
## [2,]   53   36

Note the difference to A*B computed above.

R also provides a range of functions to support linear algebra with matrices:

  • t finds the transpose of a matrix
  • solve finds the inverse of a matrix
  • eigen finds the eigen-decomposition of a matrix
  • diag, upper.tri, lower.tri provide functions to produce diagonal, upper triangular and lower triangular matrices

R Help: %*%, t, solve, eigen, diag, upper.tri, lower.tri,

1.4 Data frames

While matrices are useful, they require all elements of the matrix to be of the same type. While appropriate for linear algebra, this is less appropriate when considering a data of data where we may have numerical (continuous) data for some variables, and discrete data for others. To deal with this, we shall use a data frame which is a two-dimensional table of data (like a matrix) where each column contains values of one variable, and each row contains one set of values from each observation. Each column is labelled with a variable name, and the values within each column must be of the same type but the types of data held in each column can differ according to the type of variable it represents.

We will rarely need to construct a data frame by hand as we willy usually load our data from a package using data. However, we can construct a data frame using the data.frame function. The arguments to data frame are vectors of data for each column, and the names of the arguments are the names of the variables. Thus a small data frame containing data on the gender, height and weight of 3 people would be created by:

meas <- data.frame(gender = c("M", "M","F"), 
                   height = c(172, 186.5, 165), 
                   weight = c(91,  99, 74))
meas
##   gender height weight
## 1      M  172.0     91
## 2      M  186.5     99
## 3      F  165.0     74

R Help: data.frame,

Like a matrix, we can access specific elements using []:

meas[1,2]
## [1] 172
meas[2,]
##   gender height weight
## 2      M  186.5     99

However, we can also extract individual columns by using the variable name and the dollar-sign $ (this is usually much easier than trying to remember which number is the column you were interested in):

meas$weight
## [1] 91 99 74
meas[meas$weight>80,]
##   gender height weight
## 1      M  172.0     91
## 2      M  186.5     99
meas$gender
## [1] "M" "M" "F"

Notice how R has spotted that gender is a discrete categorical variable and it has identified the two values F and M.

We can add additional columns to a data frame by assigning them to a new column using a combination of $ and <-:

meas$age <- c(28, 55, 43)
meas
##   gender height weight age
## 1      M  172.0     91  28
## 2      M  186.5     99  55
## 3      F  165.0     74  43

Some functions, such as summary, can be applied to entire data frames:

summary(meas)
##     gender              height          weight          age      
##  Length:3           Min.   :165.0   Min.   :74.0   Min.   :28.0  
##  Class :character   1st Qu.:168.5   1st Qu.:82.5   1st Qu.:35.5  
##  Mode  :character   Median :172.0   Median :91.0   Median :43.0  
##                     Mean   :174.5   Mean   :88.0   Mean   :42.0  
##                     3rd Qu.:179.2   3rd Qu.:95.0   3rd Qu.:49.0  
##                     Max.   :186.5   Max.   :99.0   Max.   :55.0