Files - csv

The text files we looked at in the previous lesson were relatively unstructured. However other files, such as the Comma Separated Value (CSV) file have a specified structure to it. To best handle this, we should include a special library that can read, parse, and do various actions on these files such that we don't need to do it ourselves. A file with the csv extension, separates content in it with commas. One of the structured aspects of the CSV file is that each row needs to have the same number of commas. If this structure is conformed, then we can load the data directly into a dataframe without having to bother about iterating through it and storing it. If you're interested in learning more about dataframes, please head on over to our course in Data Science Let's Learn Data Science with Python! where we cover dataframes in much more detail. For now, we can suffice to know that a dataframe is structured data. A popular library for reading in dataframes comes in the Pandas library.

m

Lets first take a look at what the contents of the file are, then see how using the function read_csv in the pandas library interprets the csv file.

This creates a dataframe with three columns and three rows of data. Having the data in this structure allows us to manipulate the data in various ways much easier, such as grouping by town or calculating a mean population.

We might be wondering, well what if the contents of the data contains a comma? This will break the equal comma rule. If that's the case, then one can change the delimiter used in the csv and use it to load the data. We still use a csv extension. For example.

By telling the function how we structured the separator or delimiter in the csv file, it knows how to parse the lines, resulting in the same output.