Chapter 8 R Language Characteristics
8.1 Help
Use ?
to see the document of a function. One can even use ?
to see the document of ?
: try ?"?"
. The document, however, sometimes does not help a lot. The target audience of R document is usually not for beginners so may contain examples or wordings that are hard to understand. If one is consulting third-party packages not included in the core R, the quality of documents are indeed varying. Some authors provides good documentation while the others doesn’t. That said, documents are only one source of help for R learners. Usually a better way is by googling and searching on stackoverflow.
8.2 Vectorization
Every built-in data structure in R is of vector form. This means that many of the built-in functions that operate on these data structures actually operates on vectors. Consider the following minimal example:
1:10 + 1
## [1] 2 3 4 5 6 7 8 9 10 11
Yes the add (+
) function is vectorized. An expereicend R user won’t write code like this:
res <- integer(10)
for ( i in 1:10 )
res[i] <- i + 1
res
## [1] 2 3 4 5 6 7 8 9 10 11
The performance of vectorized codes and non-vectorized one can be huge:
sillyAdd <- function(n) {
res <- integer(n)
for ( i in 1:n ) res[i] <- i + 1
res
}
system.time(res1 <- 1:1e6 + 1)
## user system elapsed
## 0.044 0.005 0.051
system.time(res2 <- sillyAdd(1e6))
## user system elapsed
## 0.897 0.013 0.919
identical(res1, res2)
## [1] TRUE
Due to performance (in most cases) gain, one should always consider using vectorization whenever possible.
One thing usually mis-understood by beginners: Are the apply
family vectorized? The anwser is disppointedly a NO. apply
family is a functional way (without side-effect) to write down codes intending to do loop-like tasks. They are NOT vectorization of any kind. This also means they do NOT gain any performance advantage.
8.3 Recycling
Recycling usually happens when an operation involves in at least two vectors, and one is shorter than the other. Take extraction by logical vector for example:
vec <- 1:10
vec[c(T, F)]
## [1] 1 3 5 7 9
The logical vector c(T, F)
is recycled to make sure it is as long as the vector to be extracted. Recycling often occurs without awaring users. No warning. No message. Recyclingis everywhere:
1:10 + 1
## [1] 2 3 4 5 6 7 8 9 10 11
Here the shorter vector (which is 1
) is recycled to have length 10. To see things more clear:
identical(1:10 + 1:2, 1:10 + rep(1:2, 5))
## [1] TRUE
8.4 The apply
Family
The apply
family is really one outstanding feature in the core of R language. Among the family, the most useful two are apply
and lapply
. Sometimes sapply
also comes in handy.
8.4.1 apply
apply
will call the given function column-by-column or row-by-row, according to the MARGIN
argument supplied.
(mm <- matrix(1:12, 4, 3))
## [,1] [,2] [,3]
## [1,] 1 5 9
## [2,] 2 6 10
## [3,] 3 7 11
## [4,] 4 8 12
apply(mm, 1, sum) # by the 1st dimension: row
## [1] 15 18 21 24
apply(mm, 2, sum) # by the 2nd dimension: column
## [1] 10 26 42
apply
use ellipsis, so it is very easy to apply functions with additional arguments:
mm <- matrix(1:12, 4, 3)
apply(mm, 2, mean, trim=.2) # apply the trimmed mean over each column
## [1] 2.5 6.5 10.5
apply(mm, 2, paste, collapse=':') # concatenate all numbers for each column
## [1] "1:2:3:4" "5:6:7:8" "9:10:11:12"
or lambdas:
mm <- matrix(1:12, 4, 3)
apply(mm, 2, function(x) sd(x) / mean(x))
## [1] 0.5163978 0.1986145 0.1229519
apply
can also be used on data.frame
in exactly the same way:
str(cars)
## 'data.frame': 50 obs. of 2 variables:
## $ speed: num 4 4 7 7 8 9 10 10 10 11 ...
## $ dist : num 2 10 4 22 16 10 18 26 34 17 ...
apply(cars, 2, max)
## speed dist
## 25 120
8.4.2 lapply
and sapply
lapply
is a list apply
, so it operates on a list.
ll <- list(1:10, letters)
lapply(ll, length)
## [[1]]
## [1] 10
##
## [[2]]
## [1] 26
lapply
always returns a list of the same length as its input. Sometimes it is sufficient to return only an atomic vector. Thgat’s where sapply
comes in handy:
sapply(ll, length)
## [1] 10 26
Most of the time the call to sapply
will have the effect of a unlist(lapply(...))
.