Selecting Rows
In the Selecting Columns lesson
we covered how to select columns using loc
. In this lesson,
we return to this function, but now we will cover how to select rows.
The loc
method can be viewed as an extension of
the head
and tail
functions, capable
of behaving like either of them based on the inputs it receives.
Essentially, it subsets a dataframe to include only the specified
row and columns numbers.
Let's review how we used loc
previously.
In our previous experience with this method, we used it to
select columns. Such as selecting Sepal.Length, and
Species from the iris dataframe as reproduced
in the code block on the right.
In this example, we select the first row by passing in the row number,
in this case 0, since Python uses a zero based counting
system. And we select all columns with the
colon symbol :
.
But wait, looking at the output of this code editor, it looks
different than usual. This is because when we select only one row,
Pandas returns a Pandas.Series
object. And this
Pandas.Series
object prints out differently
than a dataframe.
Let's now select both the first through third rows. We can do this by
specifying a range of values in the first position before the comma.
In this case, we can specify 0:2
.
At the same time, let's print out the datatype of the output.
And compare it to the datatype of the previous; we see
the datatype when we select a single row to be
Pandas.Series
, while when we select more than
one row to be Pandas.DataFrame
.
After a groupby
Pretty simple so far right! In this next section we want to
show how to subset data following a
group-by operation. In this example, we will retain the
first two observations for each Species. Note that
we actually return to using the head
method
to do this, the loc
method can be also used,
however it is a tad more complex to implement.
In our last example, lets employ the tail
method,
to grab the final observation within each Species.
Practice exercise
Use groupby
and tail
to keep the last two
rows of each Species.