Libraries and Packages
Let's inspect and breakdown the first four lines of code we saw in the previous lesson.
import pandas as pd
import seaborn as sns
import matplotlib as plt
iris = pd.read_csv("iris.csv")
These four lines are loading things that other people have created, into our Python environment. The first three lines load some packages called pandas, seaborn, and matplotlib as well as giving a shortend name for them called an alias, calling them pd, sns, and plt respectively. Loading a package grants us access to all its contents. The fourth line loads the iris dataset.
Let's look at these import lines. The import command loads a library. However, to understand what a library is, we also need to learn what a package and function are as well, so here are the definitions of each of these three things:
A library is the location where a package is located.
A package is a collection of functions, data and documentation that extends the functionality of Python.
A function transforms one or multiple inputs by following a set of instructions and produces an output.
Let's look at the diagram below to help see the relation between these three things.
Let's start on the inside and work our way out. In the diagram we see three functions, each function may serve different purposes, take different inputs and produce different outputs. We will cover functions in more detail in the next lesson, so we will just also note here that often a function is designed to do just one thing. And as such we may need to use multiple functions to accomplish our task. For this reason, multiple functions are collected together into a package which is why the package is the middle oval in the diagram.
A library is the location where a package is located.
To make an analogy; a package is like a book, and a library is
the shelf where the book is stored. We might "buy" a new book
(install a package) and put it on our shelf (in a library).
When we want to read it (use the package's functions), we need to
take it off the shelf (by using the library
function to load it).
Python has a central software repository called the
Python Package Index (PyPI). And this is where
the pandas, seaborn, and matplotlib packages and many
more packages are stored.
So there is actually one more hidden line of code that is ran before
these four, and that is pip install pandas
.
This code usually has to be ran before entering a Python environment.
When it is ran, the pip package management system will search within
this PyPI online repository for
a package called pandas. After it finds it, it will save it into
a library on our own computer where Python can find it and access it easier.
The same thing goes for the other two packages.
The fourth line loads the iris dataset we have been looking at, into Python's environment. This is also how we have been loading the dataset in the code editors in the previous lessons. Throughout courses here at Digi Cafe, we will load the packages and data for you, so you can focus on learning the concepts and skills. However, knowing what environment the code runs in can help provide the additional context needed to understand the code. You can always find out what environment the code is running in by checking the Code environment dropdown menu at the top of any lesson with a code editor.
Now that we understand what a package and a library is let's finish learning about the third item functions in our next lesson!