Assign
We have learned enough to start applying multiple concepts
together such as method chaining, the methods in the previous lesson,
and in this lesson, the assign
method. We will also learn
about white space and how to use it.
Say we want to add another column to the iris dataframe, which we will name
Sepal_Length_2, that is equal to twice the Sepal.Length.
We can do so with the following assign
method.
Note, one limitation of the assign
method
is that we cannot create the new column name as a string, as such, we need
to use a naming convention that is a valid variable name. So with
new column names from here on, we will use underscores instead of periods.
Just as the mean
method, the assign
method
comes after the dataframe name, and
within the method argument, the ordering goes:
"new variable name" on the left of the equal sign and whatever
operation on an existing column in the dataframe on the
right of the equal sign.
assign(new_column = df['existing_column'] + c)
We can make multiple variables at the same time, we need only separate the two with a comma.
White space
Unlike other programming languages such as R, Python uses white space to associate code with classes and functions. However, we can make code more readable by collecting code within parenthesis, pushing text to a new line and indenting so similar code is stacked on top of each other.
The design of not evaluating white space allows code to be presented in easer to read formats and avoids horizontal scrolling. There is no hard limit to what can be put on one line, but we have found keeping Python code to a maximum of 80 characters per line easier to read; the exact number that works for you depends on your screen size, resolution, and personal preference.
In the following code editor we have the same content as the previous code editor, we have just added white space to make it easier to read. We have added white space by pressing the enter (aka return) key to make new lines, using the tab key to indent, and lining up the closing parentheses with the tab before the indentation.
Method chaining multiple methods
Now, suppose we want to double the Sepal.Length and then also compute the mean of the modified variable. Naturally, we may accomplish this in two separate steps as illustrated below:
But there is a simpler way by chaining these two methods using method chaining. We added additional parenthesis around the method chain so we may freely add white space to make the code easier to read.
Here, we can finally see the efficiency of method chaining! By employing two method chains in a single operation, we've reduced the necessity for saving multiple objects and have streamlined the code overall.
The iris dataframe at the start of the chain, is channeled to the
assign
method. The assign
method then appends a new column, Sepal_Length_2,
to the dataframe. This modified dataframe is subsequently conveyed
to the mean
method, and we returned the mean of only
the Sepal_Length_2 column.
Practice exercise
Fill in the rest of the code within the assign
method
to find the standard deviation of Petal.Length + 1
by calling
the std
method on a column you will create
called Petal_Length_2
.