Packages
In our Libraries and Packages lesson, we first introduced the concept of the package. In this lesson, let's dive deeper into this topic.
There is a vast universe of functions and tools created by other developers that are available for public use. These tools and functions are bundled together in what is aptly called, a package.
Essentially, a package is a collection of code, data, and
accompanying documentation. For Python, these are hosted on repositories
such as:
The Python Package Index (PyPI) and
The Conda Repository.
Packages on these repositories are
made accessible through package managers such as
Pip,
Conda, or
Poetry.
Every package on a repository is free for users. For instance, when we made a
dataframe, we used the pandas package. To use a
package, it generally involves two steps: installing the package using
a command like:
conda install pandas
and then loading it into our environment
with import
.
conda install pandas # run in a terminal before starting Python
import pandas as pd # run in Python
Package management in Python is handled externally and its installation
method varies across the three main operating systems: Mac, Windows, and Unix.
Owing to its intricate nature, we've positioned the lesson on package
management at the end of this course. Further, throughout this course's
code editors, we've pre-installed Pandas and Seaborn for you using the
Conda package manager, as could be done using the first-line command of
the previous code block. When a package is installed, it's stored in a
specific location on our computer. To utilize it in our current Python
session, we use the import
command, granting access to the
package's data and functionalities. It's important to note that every
time we initiate a new Python session, we'll need to reload any package
we wish to use since Python's environment resets at each start.
PyPI and the Conda Repository hosts tens of thousands of packages. Before being made available, each package undergoes rigorous checks to ensure its contents are accurate and reliable. So we can be safe in knowing the function correctly does what it says it does.
Let's explore the import statement further. When we imported the
Pandas package, we assigned it an alias pd using the
as
keyword. An alias is essentially a
shorthand or nickname for the package. By using the alias
pd for pandas, we can invoke functions from the
Pandas library without typing the full package name every time.
It's common practice to use succinct aliases, often just two or
three letters long. While we can choose any alias, there are
conventions that the Python community tends to adopt. For instance,
pd is conventionally used for Pandas and sns for Seaborn.
To utilize a class or function from a package, we start with the package name or its alias, followed by a period. This period signifies the distinction between the package name or alias and the specific class or function we intend to invoke from it.
Package Versions
When package maintainers want to update the code in their package, they will push a new version of their package to a repository with the new code. Users then have the choice to update to this newer version at their convenience. Such updates are crucial to ensure the package is up-to-date with the latest changes, bug fixes, and new features.
As with most of programming, convention has been established in how
packages are versioned.
Packages are identified by three numbers separated by
a period such as 1.0.2
.
Here is what each of these numbers mean:
1._._
: The first digit is the majour version number. A change here implies significant, potentially breaking alterations to the package._.0._
: The middle digit is the minor version number. An increment here introduces new features without compromising existing functions._._.2
: The last digit represents the patch number. An increment typically means minor fixes or enhancements to the package's functions.