Chapter 6 Control Flow

6.1 Conditioning

6.1.1 if

The if statement (a function indeed) can handle conditional branching of the codes. It can be followed by optional else if and else branches.

if ( TRUE ) {
  print("Yes it is true.")
} else if ( TRUE ) {
  print("This is never printed.")
} else {
  print("When nothing above holds, this is printed.")
}
## [1] "Yes it is true."

Notice that when else if exists, conditions are checked in order. The first matched branch will be executed and all the other dsicarded. This is why the second branch is never triggered in the above example even if the condition is TRUE as well.

The condition in if can not handle logical vectors with length more than one. In the following example, a warning will be issued with the result clearly explained.

if ( c(TRUE, FALSE) ) {
  1 + 1
}
## Warning in if (c(TRUE, FALSE)) {: the condition has length > 1 and only the
## first element will be used
## [1] 2

This is why sometimes people think there does exist scalar variable in R. But the truth is that everything is in vector form.

There is a more functional way of using if. Consider this example:

# set.seed(777)
cond <- sample(1:10, 1)
res <- if ( cond > 5 ) "foo" else "bar"
res
## [1] "bar"

The above code is stochastic due to sample without a fixed seed. Here one directly asign the result of an if to a variable. Such syntax only works in functional language. It works because if is a function, and a function should always return something–here evaluation of the last expression of if is the returned value of it.

When the conditioning is more complicated, a recommended indent style may look like:

# set.seed(777)
cond <- sample(1:10, 1)
res <- 
  if ( cond > 5 ) {
    message("cond is ", cond, ": go if")
    "foo" 
  } else {
    message("cond is ", cond, ": go else")
    "bar"
  }
## cond is 1: go else
res
## [1] "bar"

6.1.2 ifelse

There is a vectorized version of if: the ifelse function. Now it accepts a vector of any length:

vec_cond <- sample(1:10, 10, replace=TRUE) > 5
ifelse(sample(1:10, 10, replace=TRUE) > 5, 1:10, -(1:10))
##  [1]   1  -2  -3  -4  -5   6  -7  -8   9 -10

Notice that the call to sample results in a vector of length 10, the second argument is a vector when the condition holds, and the third for condition not holds. What happen if the vector lengths are different? Recycling will occur. See section 8.3 for more details about recycling.

6.1.3 Logical Operations

There are in general two types of logical operators in R. The single form (like &, |) and the double form (like && annd ||). The former is vectorized, i.e., element-wise operator, and the latter is scalar-like: only the first element is processed. To see things clear:

c(TRUE, FALSE, FALSE) & -1:1 
## [1]  TRUE FALSE FALSE
c(TRUE, FALSE, FALSE) && -1:1 
## [1] TRUE

Users who are familiar with other programming language may feel confused because in most general-purposed language only the double form is used as logical operator and the single form is used to perform bit-wise operation. In the R language this is simply not true. The single form serves both as bit-wise and also vectorized logical operator. Use xor for vectorized logical AND bit-wise XOR operator.

One should also note that there is implicit casting (or coercion) happening in the above example. Numeric vectors are coerced into logical ones before a logical operation actually takes place.

6.2 Loop

6.2.1 repeat

To repeat a code chunk, use repeat. Since there is no conditioning in repeat, usually users must also specify a conditonal break by if to avoid an infinite loop.

cnt <- 0
repeat {
  print("Hello World!")
  cnt <- cnt + 1
  if ( cnt >= 5 )
    break
}
## [1] "Hello World!"
## [1] "Hello World!"
## [1] "Hello World!"
## [1] "Hello World!"
## [1] "Hello World!"

6.2.2 while

The while loop is another common control flow available in most programming language. It accepts a condition and as long as the condition holds the code chunk will repeat. The condition is checked at the very begining of each run.

cnt <- 0
while ( cnt < 5 ) {
  print("Hello while!")
  cnt <- cnt + 1
}
## [1] "Hello while!"
## [1] "Hello while!"
## [1] "Hello while!"
## [1] "Hello while!"
## [1] "Hello while!"

There is no “until” loop in R. One can use repeat to implement an “until” function. Also a while ( TRUE ) implementation can be readily replaced with repeat.

6.2.3 for

Another must-have loop structure is the for loop. It implements iteration operations. The basic syntax of for would look like:

for ( i in 1:5 )
  print(i)
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5

Remember that one can use if functionally. That is, we cab asign the returned value of a if to a variable. Is that also true for for? Consider the following codes:

res <- for ( i in 1:10 ) i
res
## NULL
i
## [1] 10

Oops! the variable res has a value of NULL, not a value of the last evaluated i (which shall be 10). Does this mean that for is not a function? No. Every operation in R is a function call. if is a function. for is a function. But for is a function that will return NULL by default and that is not alterable. More interesting, for actually returns NULL invisibly, so one won’t get a NULL printed to the console everytime when a for is done.

Iterating over numbers is not all what for can do. Basically a for can iterate on any vector, atomic or recursive. This is powerful and is often ignored by beginners.

a <- rnorm(100)
b <- runif(100)
for ( l in list(a,b) )
  print(mean(l))
## [1] -0.08564948
## [1] 0.4978582

All vectors can be iterated. What about a matrix? Remember that a matrix is simply an atomic vector with a dim attribute. So surely it can be iterated: (only for illustration purpose because such scenario is rarely practical)

for ( i in matrix(1:4,2,2) ) print(i)
## [1] 1
## [1] 2
## [1] 3
## [1] 4

for is useful. But don’t use for for large-sized iteration. The overhead of for is usually quite high and since most basic operations in R are vectorzied, one should consider vectorizing in the first place when developing scripts. The following somewhat silly coding illustrates the basic difference in performance when vectorization is available:

tt <- vv <- kk <- integer(1e5)
system.time(for ( i in 1:length(tt) ) vv[i] <- tt[i] + 1)
##    user  system elapsed 
##   0.146   0.006   0.155
system.time(kk <- tt + 1)
##    user  system elapsed 
##       0       0       0
identical(vv, kk)
## [1] TRUE

To understand more, see the section 8.2.