Selecting Rows

In the Selecting Columns lesson we covered how to select columns using loc. In this lesson, we return to this function, but now we will cover how to select rows.

The loc method can be viewed as an extension of the head and tail functions, capable of behaving like either of them based on the inputs it receives. Essentially, it subsets a dataframe to include only the specified row and columns numbers.

Let's review how we used loc previously. In our previous experience with this method, we used it to select columns. Such as selecting Sepal.Length, and Species from the iris dataframe as reproduced in the code block on the right.

In this example, we select the first row by passing in the row number, in this case 0, since Python uses a zero based counting system. And we select all columns with the colon symbol :.

But wait, looking at the output of this code editor, it looks different than usual. This is because when we select only one row, Pandas returns a Pandas.Series object. And this Pandas.Series object prints out differently than a dataframe.

Let's now select both the first through third rows. We can do this by specifying a range of values in the first position before the comma. In this case, we can specify 0:2.

At the same time, let's print out the datatype of the output. And compare it to the datatype of the previous; we see the datatype when we select a single row to be Pandas.Series, while when we select more than one row to be Pandas.DataFrame.

After a groupby

Pretty simple so far right! In this next section we want to show how to subset data following a group-by operation. In this example, we will retain the first two observations for each Species. Note that we actually return to using the head method to do this, the loc method can be also used, however it is a tad more complex to implement.

In our last example, lets employ the tail method, to grab the final observation within each Species.

Practice exercise

Use groupby and tail to keep the last two rows of each Species.