Datatypes

Datatypes define what and how operators can be applied to each datatype, as well as how much memory should be allocated for each. We will cover integer, float, string, and boolean datatypes in this lesson. While there are more types than shown here, we will refrain from covering them due to their lower usage in data science. If interested, please check out our other course: Let's Learn Python for Software Development! to learn more about them.

We can check the datatype of a variable with the built in function class.

Again, we will go into what specifically each of these datatypes mean, however for now, we can see there are a variety of types. We see that x is a numeric, y is a numeric, and z is a character.

Numerics: integers and floats

Two of the most used numeric datatypes are integers and floats. In R, both integers and floats are categorized under the broader umbrella of "numerics" because they are both used to represent numerical values. R's approach is rooted in its design for statistical computing and data analysis, where the distinction between whole numbers and decimals is less crucial than in some other programming paradigms. By treating them collectively as numerics, R simplifies operations and functions that work on numerical data.

An integer is a whole number, such as: -1, 2, or 3.

While a float is a number with a decimal point representation, such as: 1.0, 2/5, or 3.14159.

Characters: strings

A string is a sequence of alpha-numeric characters. They can be empty, or contain one or more characters, including numbers. Also, strings can be enclosed by either single or double quotes, just don't try mixing them. The history behind the name comes from the fact that any kind of text, is a sequence of characters. String has the same meaning as sequence, and so text was referred to by the early programmers as a "string of characters". This phrase then become shortened to just "string".

While string is the more common term for text, R again has its own lingo, and prefers to call them as characters.

Logicals: booleans

A boolean is a variable that takes on only one of two possible values, this datatype may also be called a logical, and is how R prefers to call them. These two possible values are:

  • TRUE or FALSE

Booleans are used to represent the truth value of an expression. We will return to them later when we learn conditional expressions in the Filter lesson.

Why Datatype matters

The importance of knowing the datatype of an object is that every function is defined to work on a certain type of data input, such as only integers or only strings. For example, addition only makes sense when adding two numbers together.

However, if we try adding a numeric with a string, R will return to us an error.

One way to fix this type of issue is by converting the type of a variable to another type, this process is called casting. With casting we can for example convert a variable of type integer to string, or conversely, string to numeric. We can do so with the built-in functions as.character, and as.numeric, respectively. We then collect the two elements together using the paste0 function, which concatenates strings together.

Practice exercise

Why cannot we convert the string "two" to a numeric? What kind of string can we convert to a numeric? And how can we update the content in x to fix this error?