Packages

In our Libraries and Packages lesson, we first introduced the concept of the package. In this lesson, let's dive deeper into this topic.

Up until now we have been utilizing built-in functions like print, c, and data.frame, available by default in R. However, there's a vast universe of functions and tools created by other developers that are available for public use. These tools and functions are bundled together in what is aptly called, a package.

Essentially, a package is a collection of code, data, and accompanying documentation. For R, these are primarily hosted on the Comprehensive R Archive Network (CRAN). Every package on CRAN is free for users. For instance, when we accessed the iris dataset, we used the datasets package. To use a package, it generally involves two steps: installing the package using install.packages and then loading it into our environment with library.

The command install.packages('datasets') fetches and installs the datasets package from CRAN onto the computer, a step that is required only once. Subsequently, using library(datasets) loads the package into our R environment, granting access to the package's data and functionalities. It's important to note that every time we initiate a new R session, we'll need to reload any package we wish to use since R's environment resets at each start.

In fact, the datasets package is automatically loaded in every R session, so there's no need to install it explicitly from CRAN. Other foundational packages, which include functions like data.frame and c, are also pre-loaded.

CRAN hosts tens of thousands of packages. Before being made available, each package undergoes rigorous checks to ensure its contents are accurate and reliable. So we can be safe in knowing the function correctly does what it says it does.

Package Versions

When package maintainers want to update the code in their package, they will push a new version of their package to a repository with the new code. Users then have the choice to update to this newer version at their convenience. Such updates are crucial to ensure the package is up-to-date with the latest changes, bug fixes, and new features.

As with most of programming, convention has been established in how packages are versioned. Packages are identified by three numbers separated by a period such as 1.0.2. Here is what each of these numbers mean:

  • 1._._: The first digit is the majour version number. A change here implies significant, potentially breaking alterations to the package.
  • _.0._: The middle digit is the minor version number. An increment here introduces new features without compromising existing functions.
  • _._.2: The last digit represents the patch number. An increment typically means minor fixes or enhancements to the package's functions.