Digi Cafe - Beginner Data Science in R

Courses

Explore

Blog
Community

Pricing

undefined undefined

undefined

English

undefined undefined

undefined

English

Courses
Explore
- Blog
- Community
Pricing

Preferences

Code Theme

solarized light

login

Beginner Data Science with R!

0 / 0 Lessons Completed

solarized light

Summarise

Let's take a look at our first function in the Tidyverse. The summarise function is useful for finding summary statistics about a dataframe. We use this by starting with the name of the dataframe, call the pipe operator, then call the summarise function. Within the summarise function, it takes an argument of the form: y = stat_function(column_name). Where y is the name of the column to be made in the returned dataframe; column_name is the column name in the dataframe to the left of the pipe operator we want to calculate a summary statistic of; and stat_function is a statistical function such as one below:

mean: calculate the mean
sd: calculate the standard deviation
n: return the length of the dataframe
max: calculate the maximum
min: calculate the minimum

solarized light

In the previous code editor we used summarise to calculate the mean. Another way to do this would be to simply use the mean function.

solarized light

There is a difference in the output of these two methods, and the difference is in the datatypes of the returned object.

We see that the datatype of sepal_mean_1 is data.frame while sepal_mean_2 is numeric.

solarized light

If we want the standard deviation, we can use sd.

solarized light

The length of the dataframe can be found with n. An interesting point here is that we do not need to pass in a column name to this function as we do the rest. This is because n operates on the entire dataframe.

Another way to do this is with the dim function. The first value is the number of rows and the second is the number of columns.

solarized light

If we want the maximum or minimum, we can use max or min.

solarized light

Practice exercise

Use the pipe operator and the summarise function to find the mean of Petal.Length

solarized light