Packages

In our Libraries and Packages lesson, we first introduced the concept of the package. In this lesson, let's dive deeper into this topic.

There is a vast universe of functions and tools created by other developers that are available for public use. These tools and functions are bundled together in what is aptly called, a package.

Essentially, a package is a collection of code, data, and accompanying documentation. For Python, these are hosted on repositories such as: The Python Package Index (PyPI) and The Conda Repository. Packages on these repositories are made accessible through package managers such as Pip, Conda, or Poetry. Every package on a repository is free for users. For instance, when we made a dataframe, we used the pandas package. To use a package, it generally involves two steps: installing the package using a command like: conda install pandas and then loading it into our environment with import.

Package management in Python is handled externally and its installation method varies across the three main operating systems: Mac, Windows, and Unix. Owing to its intricate nature, we've positioned the lesson on package management at the end of this course. Further, throughout this course's code editors, we've pre-installed Pandas and Seaborn for you using the Conda package manager, as could be done using the first-line command of the previous code block. When a package is installed, it's stored in a specific location on our computer. To utilize it in our current Python session, we use the import command, granting access to the package's data and functionalities. It's important to note that every time we initiate a new Python session, we'll need to reload any package we wish to use since Python's environment resets at each start.

PyPI and the Conda Repository hosts tens of thousands of packages. Before being made available, each package undergoes rigorous checks to ensure its contents are accurate and reliable. So we can be safe in knowing the function correctly does what it says it does.

Let's explore the import statement further. When we imported the Pandas package, we assigned it an alias pd using the as keyword. An alias is essentially a shorthand or nickname for the package. By using the alias pd for pandas, we can invoke functions from the Pandas library without typing the full package name every time. It's common practice to use succinct aliases, often just two or three letters long. While we can choose any alias, there are conventions that the Python community tends to adopt. For instance, pd is conventionally used for Pandas and sns for Seaborn.

To utilize a class or function from a package, we start with the package name or its alias, followed by a period. This period signifies the distinction between the package name or alias and the specific class or function we intend to invoke from it.

Package Versions

When package maintainers want to update the code in their package, they will push a new version of their package to a repository with the new code. Users then have the choice to update to this newer version at their convenience. Such updates are crucial to ensure the package is up-to-date with the latest changes, bug fixes, and new features.

As with most of programming, convention has been established in how packages are versioned. Packages are identified by three numbers separated by a period such as 1.0.2. Here is what each of these numbers mean:

  • 1._._: The first digit is the majour version number. A change here implies significant, potentially breaking alterations to the package.
  • _.0._: The middle digit is the minor version number. An increment here introduces new features without compromising existing functions.
  • _._.2: The last digit represents the patch number. An increment typically means minor fixes or enhancements to the package's functions.