Files - yaml
Opening a yaml file is a bit like opening a text file with an extra step to process the data contained within. We first create a variable to store context for the file by opening it, then call a function to read the data into a dictionary.
Here's the data we'll be working with in this lesson. Assume this file is named data.yaml and it's in the same folder as the code we use to interact with it.
japan:
capital: Tokyo
population: 125440000
attractions: [mount fuji, sakura trees]
cities:
kyoto: { population: 1475000, attractions: [shrines] }
korea:
capital: Seoul
cities:
seoul:
population: 9776000
Seoul: { population: 9776000 }
A few things about a yaml file. Consider everything in it to be data that will be stored in one big dictionary. For a review on dictionaries, feel free to hop back to the Datastructures - Dictionaries lesson. The yaml file uses indentation to know what keys and values are associated with and what their hierarchy is. So lines with no indentation are top level keys, then one indent becomes values if they don't have a colon after it. We can also store arrays and dictionaries within yaml files as well. Normal consideration for these datatype structure such as use of commas should be observed.
We've installed a package named pyyaml on the system. However to use this package, we actually don't import pyyaml, but yaml (why isn't it consistent? One wonders).
import yaml
f = open("./data.yaml", "r")
data = yaml.load(f, Loader=yaml.FullLoader)
There are many ways we can slice the data now. It really depends on how we want to interact with it. Perhaps iterating through all of the keys at the top level, targeting a specific key at some nested level, or recursing to find a specific key. Here are some examples of how we can chain dictionary lookups to get to specific levels.
print(data)
> 'japan': {'capital': 'Tokyo', 'population': 125440000, 'attractions': ['mount fuji', 'sakura trees'], 'cities': {'kyoto': {'population': 1475000, 'attractions': ['shrines']}}}, 'korea': {'capital': 'Seoul', 'cities': {'seoul': {'population': 9776000}, 'Seoul': {'population': 9776000}}}}
print(data['korea'])
> {'capital': 'Seoul', 'cities': {'seoul': {'population': 9776000}, 'Seoul': {'population': 9776000}}}
print(data['korea']['capital'])
> Seoul
print(data['korea']['cities']['seoul'])
> {'population': 9776000}
print(data['korea']['cities']['Seoul'])
> {'population': 9776000}
print(data['japan']['attractions'])
> ['mount fuji', 'sakura trees']
We also see how intermediate levels return another dictionary. Also notice how there is no requirement on the structure of the data. Some entries can stop at one level, and top level keys don't need to correspond to other keys in the file. Additionally, signifying something is a dictionary with use of curly brackets is unnecessary. Finally entries that can be coerced to specific datatypes such as numeric and boolean will be, otherwise it will be stored as a string.
Practice Question
Use the yaml file below as well as the provided code. Determine what will be printed.
japan:
population: 125440000
usa:
people: 332000000
korea:
population: 51700000
australia:
population: 25700000
import yaml
f = open("./quiz_data.yaml", "r")
data = yaml.load(f, Loader=yaml.FullLoader)
hundred_million = 100000000
for key in data:
if 'population' in data[key] and data[key]['population'] < hundred_million:
print(key)
break
f.close()
score: 0%