Summarise
Let's take a look at our first function in the Tidyverse. The
summarise
function is useful for finding summary statistics
about a dataframe. We use this by starting with the name of the
dataframe, call the pipe operator, then call the summarise function.
Within the summarise function, it takes an argument of the form:
y = stat_function(column_name)
.
Where y is the name of the column to be made in the returned
dataframe; column_name is the column name in the dataframe to the
left of the pipe operator we want to calculate a summary statistic of;
and stat_function is a statistical function such as one below:
mean
: calculate the meansd
: calculate the standard deviationn
: return the length of the dataframemax
: calculate the maximummin
: calculate the minimum
In the previous code editor we used summarise
to
calculate the mean. Another way to do this
would be to simply use the mean function.
There is a difference in the output of these two methods, and the difference is in the datatypes of the returned object.
We see that the datatype of sepal_mean_1 is
data.frame
while sepal_mean_2 is
numeric
.
If we want the standard deviation, we can use sd
.
The length of the dataframe can be found with n
.
An interesting point here is that we do not need to pass in
a column name to this function as we do the rest. This is
because n
operates on the entire dataframe.
Another way to do this is with the dim
function.
The first value is the number of rows and the second is the
number of columns.
If we want the maximum or minimum, we can use max
or min
.
Practice exercise
Use the pipe operator and the summarise
function to find the mean of Petal.Length