Data Structures

We have been looking at dataframes up till now, this is indeed the data structure we will use the most in this course, however there are a two more that we will use, in particular lists and dictionaries.

Lists

Have you ever created a grocery list or a to-do list? In Python, such collections are aptly represented by a list.

A list is an ordered sequence of items with index starting at 0.

True to Python's flexible nature, the elements in a list can be of any datatype. This means you can mix and match different datatypes, even including other lists within a list. Lists are fundamental tools in Python and are defined using square brackets [ ]. Below are examples showcasing lists that contain varied datatypes. We'll delve deeper into these datatypes in the datatypes lesson:

  • list_1: is an empty list.
  • list_2: is a list with two numbers.
  • list_3: is a list with both numbers and text.

We can determine the number of elements in a list using the built-in function len, which stands for "length". An empty list will have a length of zero, a list with one element will have a length of one, and so on and so forth.

Think of a book's table of contents. Each chapter or section has a specific page number associated with it. If we want to read a particular chapter, we look up its page number in the table of contents and then turn to that page. In this analogy, the page number is like an "index" that tells us where to find the chapter within the book.

Similarly, in programming, a list can be thought of as a book, and each item in the list is like a chapter. The "index" tells us the position of an item in the list, helping us to quickly access or modify it. Due to historical programming reasons, Python counts the first item in a list starting from 0. The second item is at index 1, and so on up until the last item in the list at index len(list) - 1. For example, in list_3 in the first code editor, there are 4 items and thus the length of it is 4. Therefore, the last index in the list is 4 - 1 = 3. This index counting can be confusing at first, but it is prevalent throughout Python and many other programming languages so it is worth trying to become comfortable with.

Lists are dynamic structures, allowing modifications even after their creation. To append an item at the end, use the append method. To remove an item, utilize the pop method with the desired index; if unspecified, it defaults to the last element. Notably, the pop method also returns the removed item.

Beyond these basics, lists offer functionalities like slicing or alternative insertion methods. To delve deeper, consider exploring the course Let's Learn Python for Software Development!

Dictionaries

Dictionaries in are highly useful. Much like a language dictionary that makes it easy to find definitions of words, in Python, dictionaries offer even broader applications.

A dictionary is a collection of unordered key-value pairs of elements, where keys are unique.

Dictionaries can be created with curly brackets { }, with the addition of a colon separating the key and value.

  • dict_1: demonstrates that we can use text as the key; when doing so, we must enclose it in quotation marks.
  • dict_2: shows us dictionaries can have numbers as the keys, as well text or lists as the values.
  • dict_3: illustrates one final crucial aspect of dictionaries: keys must be unique. If there are duplicate keys, the value of the last duplicate will be retained.

The key is the index for this data structure. In analogy to the list, the indices were 0, 1, 2, ..., and gave the associated element in the list. With the dictionary, the key is the index, and accessing it yields the value associated with this key. A few examples:

DataFrames

Let's take what we learned about indexing and apply this to dataframes. Each column in a DataFrame can be considered as a list or vector, where all the columns have the same length. For example in the dataframe below, Coffee cannot have 4 rows while Price has 3 rows; they need the same number of rows to align. We can access a column of data in a dataframe using square brackets and the name of the column in quotations. We can also access a row and column element of data in a dataframe using the method loc and indexing the row in the same way as we did with lists.