Digi Cafe - Beginner Data Science in Python

As a programming language, Python enables us to communicate with a computer and to translate a natural language such as English to the zeros and ones that a computer can understand. We will take a step away from learning Python and move instead to learning some general things about computers in the next two lessons. We believe that a better understanding of how the computer works will let us be better able to use it, and in turn make us better data scientists. As such we will learn about memory in this lesson and about interpretation in the next lesson.

Since The Assignment Operator lesson we have been saving objects into memory, but what is memory? Understanding this fundamental computer science concept will help us understand the Python environment. So in this lesson let's find out what memory is!

To understand what memory is, let's look at its fundamental buildings blocks, the bit and the byte. A computer is an electrical device that can only understand binary code, that is 0s and 1s. A single binary digit, that is, either the number 0 or the number 1, is called a bit. When computers were first being invented back in the 1940s through 1960s, many of its contributors found it convenient to collect eight binary digits into a fixed sequence where different combinations of these eight binary numbers indicated different things. Collecting these eight bits together is called a byte. There are a lot of different ways that these eight bits can be arranged, and so a pattern sequence that was invented back then that persists till today is called ASCII, which stands for American Standard Code for Information Interchange. Let's look at some examples of ASCII representations.

The letter a has the ASCII representation of 01100001 and the letter b has the ASCII representation of 01100010, and the letter c has the ASCII representation of 01100011. So the string abc would have the ASCII representation of 011000010110001001100011. These three bytes will be written to the computer hardware, such as a hard disk drive or solid state drive. Then when we ask the computer to read memory at the location where these bytes are, it will read these three bytes, then convert it to the string representation for us. We can start to see that memory is simply a (very) long sequence of 0s and 1s stored somewhere within the computer's hardware.

Now that we understand the fundamental building blocks of a computer's data, let's cover the two types of memory a computer has: persistent memory, and random access memory. Both of these are a ton of zeros and ones written onto some hardware on the computer, they differ in that the persistent memory (also called storage) is written onto an object that is slow to write and read the 0s and 1s from, but can maintain its data even without the computer being on, while the random access memory (abbreviated as RAM for short, or just called memory) is written onto a physical item in the computer that is faster to read and write from, but its data is lost when the computer is turned off. These days it is common for a computer to have RAM in the range of 4 gigabytes to 32 gigabytes and the storage range to be in the hundreds of gigabytes to terabytes!

Understanding this difference between RAM and storage is useful in learning Python, because when we use the assignment operator to save an object to Python's memory, Python is saving this within RAM. So how Python integrates with the computer is that when Python is started, it will ask the computer to reserve some memory in RAM for it. Then when we save an object into the Python environment, what Python is doing is saving the value into that part of RAM it reserved. For this reason it can be difficult to work with large datasets in Python. For example if a Python program is allocated a maximum of 4 gigabytes by an operating system, it will not have enough room to store any dataframe that is larger than 4 gigabytes. Thankfully Python has some ways to address this issue, such as dynamically increasing the RAM allocation for us, or using a database connection.

Memory