An Introduction to Pandas

There is a package in particular that is tremendously useful and nearly all Python data scientists will need to use it at some point. This is the Pandas package. Pandas builds upon another package called NumPy and is a package that other popular packages such as Seaborn and scikit-learn in turn depend on.

We have already pre-installed the Pandas package in our coding environment. So let's import it in the below code editor. In Python when we import a package, nothing prints out. If we want to double check that it successfully imported we can easily check this by checking the package version number. To check the package version, we may run the command pd.__version__. Doing so, we see the version number is: 1.5.3. The three numbers here are the package version numbers that we learned about in the Packages lesson.

The Pandas DataFrame

We made a dataframe way back for the first time in the What is Data lesson. We gave a short description of what was going on then. Let's re-look at that code and go through it in a little more detail, now that we have a better understanding of classes.

In the first line data = { 'a': [1, 2], 'b': [3, 4] } we are making a dictionary and saving it into a variable called data. The dictionary has two keys, a and b, and each key has a list of two numbers as its values.

In the second line df = pd.DataFrame(data) we are taking the dictionary we just created and saving it into a variable which we name df. The pd.DataFrame part looks a lot like a function, however it is actually a class. The pd. part indicates that we are calling from within the Pandas package. And the DataFrame part is the class constructor for the class DataFrame. Thus, the returned object df is an instance of the DataFrame class. We can see this on the third line when we print out the type of the object. The output is <class 'pandas.core.frame.DataFrame'>, concludes to us that the object is an instance of the DataFrame class.

The DataFrame class was designed to accept a dictionary as its input, and set the keys of the dictionary as the column names of the dataframe, and the values of the dictionary as the values of the dataframe.

Finally, in the fourth line, we print out the dataframe using the print function.