Chapter 8 R Language Characteristics

8.1 Help

Use ? to see the document of a function. One can even use ? to see the document of ?: try ?"?". The document, however, sometimes does not help a lot. The target audience of R document is usually not for beginners so may contain examples or wordings that are hard to understand. If one is consulting third-party packages not included in the core R, the quality of documents are indeed varying. Some authors provides good documentation while the others doesn’t. That said, documents are only one source of help for R learners. Usually a better way is by googling and searching on stackoverflow.

8.2 Vectorization

Every built-in data structure in R is of vector form. This means that many of the built-in functions that operate on these data structures actually operates on vectors. Consider the following minimal example:

1:10 + 1
##  [1]  2  3  4  5  6  7  8  9 10 11

Yes the add (+) function is vectorized. An expereicend R user won’t write code like this:

res <- integer(10)
for ( i in 1:10 )
  res[i] <- i + 1
res
##  [1]  2  3  4  5  6  7  8  9 10 11

The performance of vectorized codes and non-vectorized one can be huge:

sillyAdd <- function(n) {
  res <- integer(n)
  for ( i in 1:n ) res[i] <- i + 1
  res
}
system.time(res1 <- 1:1e6 + 1)
##    user  system elapsed 
##   0.044   0.005   0.051
system.time(res2 <- sillyAdd(1e6))
##    user  system elapsed 
##   0.897   0.013   0.919
identical(res1, res2)
## [1] TRUE

Due to performance (in most cases) gain, one should always consider using vectorization whenever possible.

One thing usually mis-understood by beginners: Are the apply family vectorized? The anwser is disppointedly a NO. apply family is a functional way (without side-effect) to write down codes intending to do loop-like tasks. They are NOT vectorization of any kind. This also means they do NOT gain any performance advantage.

8.3 Recycling

Recycling usually happens when an operation involves in at least two vectors, and one is shorter than the other. Take extraction by logical vector for example:

vec <- 1:10
vec[c(T, F)]
## [1] 1 3 5 7 9

The logical vector c(T, F) is recycled to make sure it is as long as the vector to be extracted. Recycling often occurs without awaring users. No warning. No message. Recyclingis everywhere:

1:10 + 1
##  [1]  2  3  4  5  6  7  8  9 10 11

Here the shorter vector (which is 1) is recycled to have length 10. To see things more clear:

identical(1:10 + 1:2, 1:10 + rep(1:2, 5))
## [1] TRUE

8.4 The apply Family

The apply family is really one outstanding feature in the core of R language. Among the family, the most useful two are apply and lapply. Sometimes sapply also comes in handy.

8.4.1 apply

apply will call the given function column-by-column or row-by-row, according to the MARGIN argument supplied.

(mm <- matrix(1:12, 4, 3))
##      [,1] [,2] [,3]
## [1,]    1    5    9
## [2,]    2    6   10
## [3,]    3    7   11
## [4,]    4    8   12
apply(mm, 1, sum) # by the 1st dimension: row
## [1] 15 18 21 24
apply(mm, 2, sum) # by the 2nd dimension: column
## [1] 10 26 42

apply use ellipsis, so it is very easy to apply functions with additional arguments:

mm <- matrix(1:12, 4, 3)
apply(mm, 2, mean, trim=.2) # apply the trimmed mean over each column
## [1]  2.5  6.5 10.5
apply(mm, 2, paste, collapse=':') # concatenate all numbers for each column
## [1] "1:2:3:4"    "5:6:7:8"    "9:10:11:12"

or lambdas:

mm <- matrix(1:12, 4, 3)
apply(mm, 2, function(x) sd(x) / mean(x))
## [1] 0.5163978 0.1986145 0.1229519

apply can also be used on data.frame in exactly the same way:

str(cars)
## 'data.frame':    50 obs. of  2 variables:
##  $ speed: num  4 4 7 7 8 9 10 10 10 11 ...
##  $ dist : num  2 10 4 22 16 10 18 26 34 17 ...
apply(cars, 2, max)
## speed  dist 
##    25   120

8.4.2 lapply and sapply

lapply is a list apply, so it operates on a list.

ll <- list(1:10, letters)
lapply(ll, length)
## [[1]]
## [1] 10
## 
## [[2]]
## [1] 26

lapply always returns a list of the same length as its input. Sometimes it is sufficient to return only an atomic vector. Thgat’s where sapply comes in handy:

sapply(ll, length)
## [1] 10 26

Most of the time the call to sapply will have the effect of a unlist(lapply(...)).