Conditional Expressions with Query

In the previous lesson we saw the loc method used to subset data across columns. The counterpart to this is the query method, which is used to subset data across rows. In addition, bracket notation as we learned in the previous lesson may be used when we index the row instead of the column.

query operates on conditional expressions. These conditional expressions in turn evaluate to booleans. We briefly mentioned booleans in the Datatypes lesson. To refresh, booleans are a datatype that can only be one of two values: True or False. To get these booleans from data we need to use something called conditional expressions.

Conditional Expressions

A conditional expression is a logical statement that evaluates to either True or False; in the context of filtering dataframes, this essentially creates a mask of True and False values which is used to filter the dataframe, selecting only the rows where the condition is True.

The first conditional expression we will look at is the equal expression. This is denoted by ==. And it checks whether two values are equal. For example, if we have a variable one with a value of 1, the expression one == 1 checks if one is equal to 1, which is True in this case. If the two values being compared are not the same, the expression will evaluate to False.

We'll next give examples of this and other conditional expressions in context of filter. Other expressions include:

  • greater than >
  • greater than or equal to >=
  • less than <
  • less than or equal to <=
  • not equal !=
  • included in isin

Equality ==

The following subsets the data to only rows where Petal.Width is equal to 0.2. Note that we are adding on a head function here so we are not printing out too much in the output.

Note: query accepts a string conditional expression. It will know column names in the dataframe, however some columns that have special characters (such as a period) need to be accessed using a backtick.

We can also subset the data to only rows where Species is versicolor.

We can achieve the same filtering using bracket notation. Inside the brackets, we are creating a boolean list that will filter the dataframe by rows where the condition is True.

Greater >

The greater expression subsets the data to only rows where the column is greater than the value on the right hand side. In this example we subset the data to rows where Petal.Width is greater than 1.0. So for example, 1.1, 1.01, and even the very close value 1.0000001 would all be included, but 1.0 would not be.

Greater than or equal >=

The greater than or equal to expression works the same as >, however it also includes the value on the right hand side as within the valid range. So in this case, 1.0 would be included.

Less than and Less than or equal < and <=

These work the same as > and >= but in the opposite direction. These are included as a practice exercise.

Not equal to !=

The not equal expression is the opposite of ==. So since Species == "versicolor" subsets the data to only rows where species is equal to versicolor, Species != "versicolor" will subset the data to Species that are not equal to versicolor.

Included in isin

The isin method evaluates to true whenever the content on the left hand side is included in the list of items passed as an argument to the method. As such it is quite a versatile expression. In the following example we subset the data to rows where Species is either setosa or versicolor. To do so we first make the list of strings to check for, then we use the isin method.

Of course, since there are only three species in the iris dataset an easier way to have done this would have been to exclude virginica. This method can be a life saver though if there were many more than three unique items!

Practice exercise

Use query to subset the iris dataframe to rows where Petal.Length is less than or equal to 3.0.