Data Structures
We have been looking at dataframes up till now, this is indeed the data structure we will use the most in this course, however there are a two more that we will use, in particular lists and dictionaries.
Lists
Have you ever created a grocery list or a to-do list? In Python, such collections are aptly represented by a list.
A list is an ordered sequence of items with index starting at 0.
True to Python's flexible nature,
the elements in a list can be of any datatype. This means you can mix
and match different datatypes, even including other lists within a list.
Lists are fundamental tools in Python and are defined using square
brackets [ ]
. Below are examples showcasing lists that contain varied
datatypes. We'll delve deeper into these
datatypes in the datatypes lesson:
- list_1: is an empty list.
- list_2: is a list with two numbers.
- list_3: is a list with both numbers and text.
We can determine the number of elements in a list using the
built-in function len
, which stands for "length".
An empty list will have a length of zero, a list with one
element will have a length of one, and so on and so forth.
Think of a book's table of contents. Each chapter or section has a specific page number associated with it. If we want to read a particular chapter, we look up its page number in the table of contents and then turn to that page. In this analogy, the page number is like an "index" that tells us where to find the chapter within the book.
Similarly, in programming, a list can be thought of as a book, and
each item in the list is like a chapter. The "index" tells us the
position of an item in the list, helping us to quickly access or
modify it. Due to historical programming reasons, Python counts
the first item in a list starting from 0. The second item is
at index 1, and so on up until the last item in the list at
index len(list) - 1
. For example, in
list_3 in the first code editor, there are 4 items
and thus the length of it is 4. Therefore, the last
index in the list is 4 - 1 = 3. This index counting
can be confusing at first, but it is prevalent throughout
Python and many other programming languages so it is worth
trying to become comfortable with.
Lists are dynamic structures, allowing modifications even after their
creation. To append an item at the end, use the append
method. To remove an item, utilize the pop
method with
the desired index; if unspecified, it defaults to the last element.
Notably, the pop
method also returns the removed item.
Beyond these basics, lists offer functionalities like slicing or alternative insertion methods. To delve deeper, consider exploring the course Let's Learn Python for Software Development!
Dictionaries
Dictionaries in are highly useful. Much like a language dictionary that makes it easy to find definitions of words, in Python, dictionaries offer even broader applications.
A dictionary is a collection of unordered key-value pairs of elements, where keys are unique.
Dictionaries can be created with curly brackets { }
, with the addition of a
colon separating the key and value.
- dict_1: demonstrates that we can use text as the key; when doing so, we must enclose it in quotation marks.
- dict_2: shows us dictionaries can have numbers as the keys, as well text or lists as the values.
- dict_3: illustrates one final crucial aspect of dictionaries: keys must be unique. If there are duplicate keys, the value of the last duplicate will be retained.
The key is the index for this data structure.
In analogy to the list, the indices were 0, 1, 2, ...
,
and gave the associated element in the list. With the dictionary,
the key is the index, and accessing it yields the value associated
with this key. A few examples:
DataFrames
Let's take what we learned about indexing and apply this to dataframes.
Each column in a DataFrame can be considered as a list or vector, where
all the columns have the same length. For example in the dataframe below,
Coffee cannot
have 4 rows while Price has 3 rows; they need the same
number of rows to align. We can access a column of data in a dataframe
using square brackets and the name of the column in quotations.
We can also access a row and column element of data in a dataframe
using the method loc
and indexing the row in the
same way as we did with lists.