Digi Cafe - Beginner Data Science in R

Courses

Explore

Blog
Community

Pricing

undefined undefined

undefined

English

undefined undefined

undefined

English

Courses
Explore
- Blog
- Community
Pricing

Preferences

Code Theme

solarized light

login

Beginner Data Science with R!

0 / 0 Lessons Completed

solarized light

Select

In this lesson we will learn how to subset the data by selecting columns in a dataframe.

The select function is a data manipulation function used to subset the data and is relatively straightforward to learn and use compared to the summarise and mutate functions covered in the previous two lessons.

select takes the names of the columns we want to retain and subsets the dataframe to only include those columns. For instance, in the following code block we retain only Sepal.Length and store it in a new dataframe called df.

solarized light

Multiple columns

If we want to keep multiple columns, such as both Sepal.Length and Species, we can write:

solarized light

Or we can collect the column names into a vector using the c function. For example, with the same two columns, we can collect these into a vector which we will name as vars2keep, and place it inside another function: all_of. This further step is required in newer version of the Tidyverse to avoid ambiguity in whether we want dataframe columns or an external object.

solarized light

Drop

Now suppose we have a dataframe with a hundred columns - this is not an unreasonable number of columns to have in research or industry level data science applications. And suppose we want to keep all but one column. It does not make sense to write the names of 99 column names to keep.

Instead, we can use the minus sign - to drop a column. So if we want to omit only Sepal.Length we can write:

solarized light

If we want to omit both Sepal.Length and Species we may similarly collect the two in a vector and negate that, such as in the following code editor.

solarized light

Practice exercise

Select both Petal.Length and Petal.Width from the iris dataset.

solarized light