Chapter 7 Function

Functions are very important in the R language because R is mostly a functional language. Indeed, every operation ever happens is a function call. This may not be easily seen in the first place. But while one types something like:

1 + 1

## [1] 2

what really happens is a function call to the add function (see ?"+"). The expression is a sugar: this is the usual way human beings write down a mathematical expression. The functional way of such expression would be more like this one:

`+`(1, 1)

## [1] 2

This fact holds not only for +, but also other operators like -, *, /, >, <, and =. Indeed these so-called “operators” are just functions. Control flows are also functions (see ?"if"). Indeed, even { and [ are functions.

Functions are objects in R. So they can be manipulated in the way objects can be. Let’s do some impractical but potentially inspiring coding:

for ( op in c(`+`, `-`, `*`, `/`) )
  print(op(10, 2))

## [1] 12
## [1] 8
## [1] 20
## [1] 5

Functions are objects. And objects can be asigned. Here one asigns a function to another variable and use that variable to call the function. Another question in hand: is c(`+`, `-`, `*`, `/`) an atomic vector or recursive one? There is no function-typed aotomic vector. So even if the elements are all function (indeed, builtin), the resulting concatenation is a list.

7.1 User-Define Function

A function can be created using the function function.

f <- function() {}

In the aboce one liner a function f that does nothing is defined. A function can be called when suffixed with parantheiss ():

f()

## NULL

Notice that a NULL is return even if the function definition didn’t do it explicitly. Functions should always return something. NULL is that something, for acutally nothing. To explicitly return a value in a function definition, one can use the return function:

f <- function() {
  return(1)
}
f()

## [1] 1

It is important to understand that a return is an exit point for a function. It signals the function to end its task. So consider the following function definition:

f <- function() {
  print(0)
  return(1)
  print(2)
}
f()

## [1] 0

## [1] 1

The number 2 will never be printed because there is a return before it. One don’t need a return to return values in a function. When there is no explicit return, a function will return its last-evaluated value:

f <- function() {
  1 + 1
  2 + 2
}
f()

## [1] 4

So why bothering use the return? It can be handy if one expects a function to exit in an early point by some conditioning:

f <- function() {
  if ( sample(c(TRUE, FALSE), 1) ) {
    print("No Luck.")
    return(NULL)    
  }
  print("Do things...")
  1 + 1
  2 + 2
}
for ( i in 1:5 ) { f() }

## [1] "No Luck."
## [1] "No Luck."
## [1] "Do things..."
## [1] "Do things..."
## [1] "Do things..."

The above function may randomly exit.

7.2 Arguemnts

A function can have arguemnts. Returned values are output of a function, arguments are input. Both are optional in coding but jusy the former will be implicitly set. To use arguments in a function, just name it in the definition:

f <- function(x, y) x # { is ignored if the body is one-lined
identical(f(1), f(2, x=1))

## [1] TRUE

Several things worth noting:

There can be many arguments
Arguments are not typed: just as all other variables in R
Arguments can bve supplied by position or by name
Arguments that are not used are not checked at all (so can be missing)

To access the full list of arguments within a function, one can utilize the function match.call:

f <- function(x, y) {
  args <- as.list(match.call())
  str(args)
}
f(1, 2)

## List of 3
##  $  : symbol f
##  $ x: num 1
##  $ y: num 2

7.2.1 Default Value

Arguments can have default valuea given in the definition. When actually supplied in the function call, the default value will be overwritten.

f <- function(x, y=1) {
  x + y
}
for ( i in 1:5 ) print(f(i))

## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6

for ( i in 1:5 ) print(f(10, i))

## [1] 11
## [1] 12
## [1] 13
## [1] 14
## [1] 15

7.2.2 Lazy Evaluation

Function arguments are lazily evaluated. This means that arguments are only needed when they are actually used. Consider the following example:

f <- function(x) x
g <- function(x) 1
f()

## Error in f(): argument "x" is missing, with no default

g()

## [1] 1

Both function g and f have arguemtn x. But when calling without supplying x, only f issues an error. This is because in the g function the argument x actually takes no place. Consequently, x is not even evaluated in calling g, so there will be no error for not finding x simply due to no evaluation for x at all. This is not the case for function f.

7.2.3 Ellipsis (`...`)

The triple-dot expression is a special argument types that can come in handy in certain circumstances. It absorbs all arguments supplied not in the pre-defined named arguments.

f <- function(x, ...) {
  list(...)
}
f(a=1, b=2)

## $a
## [1] 1
## 
## $b
## [1] 2

In the above example, the argument a and b are both not pre-specified, and they are absorbed into the ellipsis varaible that can be casted to a list for further reference. Why is this helpful? Now consider this rather sophisticated example:

f <- function(g, ...) {
  message("Now I am going to apply the function given in g...")
  g(...)
}

g <- function(x, y) x + y
f(g, x=1, y=2)

## Now I am going to apply the function given in g...

## [1] 3

Here the function f is a wrapper for function g. Many functions in R work in this way to bring in convenient programming application: notably the apply family, just name a few.

7.3 Anonymous Function (Lambda)

In all the above examples, functions are asigned to variables. Or naturally speaking, functions are named. This is, however, not necessary. Consider the following example:

f <- function(g) {
  typeof(g)
}
f(function() 1 + 1)

## [1] "closure"

What’s going on here? The function f is a function that print the result of typeof for its argument. Yes it is completely redundant and is purely created for educational purpose. The argument g can be virtaully anything. So why not a function? When one types things such as f(3) one doesn’t asign a variable the number 3 (i.e., x <- 3; f(x)) so why bothering assigning a function? That’s where the anonymous function comes in place.

Sometimes a function is a one-timer so it is not worth typing and assigning a variable for it. This is especailly true when using the apply family, which is explored in section 8.4. Anonymous functions are sometimes called lambda in other programming language and is a common utility in modern languages, even for languages that is not functional (e.g., Python).

There is another use case when one won’t explicitly assign variable to a function, because that will be done in another function! That is, one can write a function that returns a function:

f <- function(x) {
  if ( x == "add" ) {
    function(y, z)
      y + z
  } else if ( x == "mul" ) {
    function(y, z)
      y * z
  } else {
    function()
      NULL
  }
}
g <- f("add") # the result should be 3
identical(g(1, 2), f("add")(1, 2))

## [1] TRUE

Now the world is more functional. The f function is a factory to create different functions by its arguemnt. The return value of f is a newly created function–whatever name it is–and since the return value is a function, of course it can be called to work. Even the returned function doesn’t need to have be named or assigned, one can directly call it with () suffix.

7.4 Lexical Scoping

Can a function see (or use) variables defined out sied of it? Consider the following example:

y <- 10
f <- function(x) x + y
f(1)

## [1] 11

The anwser seems to be yes. Here the variable y does not appear within the function but it does exist in the global environment. To see what variable exist in the global environment, use the ls function without arguments:

ls() # list variables in the current environment

##  [1] "a"        "A"        "a_vec"    "alist"    "b"        "B"       
##  [7] "baz"      "bazf"     "bazf1"    "bazf2"    "bazf3"    "bol_vec" 
## [13] "cnt"      "cond"     "DF"       "f"        "g"        "i"       
## [19] "kk"       "l"        "matched"  "mm"       "myobject" "num_vec" 
## [25] "op"       "res"      "s"        "ss"       "str_vec"  "tt"      
## [31] "vec_cond" "vv"       "xf"       "y"

Use the ls function within a function can reveal the scope of that function:

y <- 10
f <- function(x) {
  print(ls())
  x + y
}
f(1)

## [1] "x"

## [1] 11

Clearly the function only has variable x defined because it is assigned in the argument. When a function can not find a variable within its own scope, it continue to search it on the upper scope: its parent environment. This is called lexical scoping. In the above case the function’s parent environment is the global environment. A function’s parent environment is the upper scope to its definition block. Since function definitions can be nested, functions’ scope is dynamic by nature. In the following first example, the scope of g function does not has variable y so it search upper environment, which is the scope of function f, and find a y then use it. While in the second example, even in the environment of f does not have variable y so g continues to search upper and hit the global environment (where the function f is defined) and find a y then use it.

y <- 10
f <- function(x) {
  y <- 5 # this is searched
  g <- function(x) x + y
  g(x)
}
f(1) # should be 6

## [1] 6

y <- 10 # this is searched
f <- function(x) {
  # y <- 5
  g <- function(x) x + y
  g(x)
}
f(1) # should be 11

## [1] 11

7.5 Extraction Function

Everything that happens is a function call. So things like the following should be no exception.

vv <- 1:10
vv[1:3]

## [1] 1 2 3

The <- is the assignment function, i.e., "<-"("vv", 1:10) is what really happens in the functional style. And the second line, which is more interesting, is an extraction function of the form [<- taking place. (Type ?"[<-" to see the docuemnt to confirm it realy is it.) So the second line is equivalent to:

`[`(vv, 1:3)

## [1] 1 2 3

That is not cool. Nobody writes in that (somewhat ugly?) way. So the R language provides the syntactic sugar to let users write codes in more human-readble fasion. Exactly the same logic applies to the [[ function for list.

7.6 Replacement Function

Even more interesting is the replacement function:

vv <- 1:10
vv[1:3] <- 0
vv

##  [1]  0  0  0  4  5  6  7  8  9 10

The second line really has stories. Now one knows that [ is indeed a function. An extraction function. So how come the second line works? Because one is now assigning a value to the return value of a function. Is that okay? Consider the following example:

f <- function(x) x
f(1) <- 2

## Error in f(1) <- 2: target of assignment expands to non-language object

No. One CAN’T assign a value to a function call. The reason why vv[1:3] <- 0 will work is because it is completely another function call: the call to the replacement function [<-:

vv <- 1:10
`[<-`(vv, 1:3, 0)

##  [1]  0  0  0  4  5  6  7  8  9 10

Replacement functions are everywhere. names<- is one example, dim<- is another.