An Introduction to Pandas
There is a package in particular that is tremendously useful and nearly all Python data scientists will need to use it at some point. This is the Pandas package. Pandas builds upon another package called NumPy and is a package that other popular packages such as Seaborn and scikit-learn in turn depend on.
We have already pre-installed the Pandas package in our coding environment.
So let's import it in the below code editor.
In Python when we import a package, nothing prints out.
If we want to double check that it successfully imported we can easily check this by
checking the package version number. To check the
package version, we may run the command
pd.__version__
. Doing so, we see the version number is:
1.5.3. The three numbers here are the package
version numbers that we learned about in the Packages lesson.
The Pandas DataFrame
We made a dataframe way back for the first time in the What is Data lesson. We gave a short description of what was going on then. Let's re-look at that code and go through it in a little more detail, now that we have a better understanding of classes.
In the first line data = { 'a': [1, 2], 'b': [3, 4] }
we are making a dictionary and saving it into a variable
called data. The dictionary has two keys, a
and b, and each key has a list of two numbers as its values.
In the second line df = pd.DataFrame(data)
we are taking
the dictionary we just created and saving it into a variable which we name
df. The pd.DataFrame
part looks a lot like a function,
however it is actually a class. The pd.
part indicates that we
are calling from within the Pandas package. And the DataFrame
part is the class constructor for the class DataFrame. Thus,
the returned object df is an instance of the DataFrame class.
We can see this on the third line when we print out the type of the object.
The output is <class 'pandas.core.frame.DataFrame'>
,
concludes to us that the object is an instance of the DataFrame class.
The DataFrame class was designed to accept a dictionary as its input, and set the keys of the dictionary as the column names of the dataframe, and the values of the dictionary as the values of the dataframe.
Finally, in the fourth line, we print out the dataframe using
the print
function.