Functional Programming with Functions
How we structure and design the flow of our code is up to the programmer. However, most programming languages have evolved to support a particular paradigm more than others. For R, this is functional programming. Functional programming is a programming paradigm that places focus on the creation of functions as first-class citizens, and avoids changing-state and mutable data, and is currently a popular paradigm. Two other popular programming paradigms are object-oriented programming and procedural programming which focus on the creation of objects and procedures, respectively.
For demonstration purposes for this lesson, let's imagine we want to build the next Netflix. A crucial system that Netflix needs to have in place is a mechanism to consolidate all information related to a movie, such as its title and the year it was released, into a single bundle. To illustrate the strength and versatility of functional programming, we will create a few functions to create, bundle and update this information together. Let's review our definition of a function once more:
A function is a self-contained block of code that encapsulates a set of instructions. It provides an abstraction over these instructions, promoting modularity and reusability. Typically, functions can accept input parameters, processes it, and returns an output without modifying any external state or the input itself.
Think of a function as a way to transform something, much like the recipe analogy we made in the Functions lesson. To write a function in R, we use the keyword function, then specify its parameters, and finally define its body. Good coding naming convention usually suggests to use meaningful names for functions or variables. This naming convention is usually in snake_case. Before we go further into the details, let's recall what a function looks like.
Imagine we have a template for a paper notebook called movie_data.
Let's visualize that we have a special printer that can create a notebook
precisely following this template provided to it. The line
movie_data <- function()
is analogous to saying, "Here is our
template named movie_data, which outlines the content that can be
included in a notebook." Within this template, the line
movie <- list(title = "Digi Cafe: The Movie")
specifies that every notebook
created should bear the title: "Digi Cafe: The Movie".
We save the named element title
into the list movie; for a review on lists, please
checkout the Data Structures lesson.
Finally, we return the variable movie with the
return
keyword.
Then, in the line movie = movie_data()
, it's as if we are
initiating the actual printing process and creating the physical notebook
using our movie_data template. We add the parenthesis after the
function name to indicate that we are constructing the notebook, not merely
referencing the template, and we assign it to the variable movie.
Finally, we can access the title named element by separating it
and the list object returned by the function with a dollar sign.
As we did in the movie$title
line.
Let's write a second function which will take as input, the output from
the movie_data
function. We'll name this function,
print_title
. And this function will add the title into
some text using paste0
and print this out.
Adding a method to our notebook is giving it the power to have functionalities
of it's own. The physical representation of this, is that we can imagine
we are not just making a notebook made of paper, but now an electronic
notebook such as a tablet or iPad. This electronic notebook has the ability to
display the title on its screen and any method we add to it, is like adding a
button to the screen that we can press to make something happen.
Function Parameterization
The function that we created above is rather simple in structure. In reality, functions can be much more complex. Let's start exploring how we can increase the generalizability of our function by adding parameters into it.
We may notice that in the function the title is actually hard-coded as "My Catalogue". What if we want to catalogue any other movie? We can do so by adding a parameter to our function. We will call this parameter title. We will then assign this parameter to the title named element in our list. Something else we add is a default value for the title parameter. This is done by adding an equal sign and the default value after the parameter name. In this way, if we do not pass in a value for the title parameter, then it will default to "My Catalogue". Allowing variability in the inputs facilitates code reuse and simplifies our work.
In the next code block, we will update our initial function to take in two parameters, title and release_year. We will then assign these parameters to the named elements title and release_year respectively in the list.
We printed out the title with the print_title
method,
but what if we want to print out the title and the release year?
We can do so by replacing this function with another more general one
that takes the movie object as an input, and then collect both
the title and release year together into
one string which can then be printed out using print
.
Within this function, which we will call movie_data_info
we will cast the release_year value to a string (checkout the
Datatypes lesson for a review on casting).
This casting is necessary to collect both values into one string.
Oh no, the title is not quite right. Which movie in the Harry Potter series in particular do we mean? If we want to update the title value, one way to do so would be to create another list by running the function again. This would be akin to creating a completely new electronic notebook with the new title. However, for complex objects, creating everything from scratch can lead to more errors occurring. A safer thing to do is to create a new notebook with the updated title while keeping everything else the same. This way, we adhere to the functional programming principle of immutability, which means not changing the state of the original object.
In the update_title
function, we take in a parameter
new_title and create a new list with the updated title and
the original release year. This way, we create a new notebook with
the updated title without modifying the original notebook.
Great! We have created a way to collect information together about our
movies. In this lesson we showed the immutability and not changing
state aspects of functional programming.
Not changing state refers to the fact that functions do not have
any side effects; they only use the inputs to compute the output and
do not modify any external variables. The movie_data_info
function demonstrated this aspect by only using its input, movie,
to compute the output string, s, without modifying movie
or any external variables.
This covers most of the utilities we will use with functions in data science. There are further advanced functional programming techniques such as: function composition, higher-order functions, closures, currying and recursion. We will cover the first item of function composition shortly, in the Pipe Operator lesson.