Baby Vocab Solution - Stanford Code in Place

The Baby Vocab Problem is optional after you've taken all lessons and learnings in Week 6 of Stanford Code in Place. In Week 6 you should have learned about List and Dictionary, both extremely useful.

Baby Vocab Solution - Stanford Code in Place

The Baby Vocab Problem is essentially asking this:

We have a list of words, among which some are repetitive. We want to find out the unique words from the list and count the number of each unique word. After that, we will print out the histograms for the words to visualize which words are spoken the most!

The Code in Place staff has generously provided us with two really useful functions. One is to help us load all the words into a list, and the other is to help us print out histogram bars from two inputs, which are 1) the word, and 2) the word count.

This problem has provided us with an almost perfect situation to use dictionaries as solution. If we can construct a dictionary where each pair of key and value is a combination of unique words and their respective word counts, we can use the print_histogram_bar(word, count) function to return histograms. 

First, let's create an empty dictionary.

    vocab = {}

Next, let's go over the whole list of words. For the first time we encounter a word, we will first set the count (or value) of the word (or key) to 1. If the word is already in the vocab (dictionary), we then add 1 to the count. An "if" loop might come in handy here:

    for i in range(len(words)):
        if words[i] not in vocab: # if the word is not in the vocab
            vocab[words[i]] = 1 # set the count for the word to 1
        else:
            vocab[words[i]] += 1 # if the word is already in the vocab, add 1 to the count for this word

After that we've successfully created a dictionary called vocab containing pairs of unique words and their respective word counts. To test it out, I decided to print out the dictionary to see if it works by adding a print(vocab) line here. The result didn't disappoint:

{'mama': 30, 'dada': 26, 'baba': 15, 'bye-bye': 20, 'hi': 29, 'no': 15, 'juice': 10, 'please': 7, 'apple': 5}

Last but not least, the only thing left is to print out the histograms! 

for word,count in vocab.items():
        print_histogram_bar(word, count)

Because the print_histogram_bar(word, count) function requires both the word and the word count, I chose to use vocab.items here as it helps skip the definition of word or count.

This is the full code, including the pre-programed:

def main():
    words = load_words_from_file("words.txt")
    vocab = {}
    for i in range(len(words)):
        if words[i] not in vocab: # if the word is not in the vocab
            vocab[words[i]] = 1 # set the count for the word to 1
        else:
            vocab[words[i]] += 1 # if the word is already in the vocab, add 1 to the count for this word
    # print(vocab)
    for word,count in vocab.items():
        print_histogram_bar(word, count)
    

def print_histogram_bar(word, count):
    """
    Prints one bar in the histogram.
    
    Uses formatted strings to do so. The 
        {word : <8}
    adds white space after a string to make
    the string take up 8 total characters of space.
    This makes all of our words on the left of the 
    histogram line up nicely. On the other end,
        {'x' * count}
    takes the 'x' string and duplicates it by 'count'
    number of times. So 'x' * 5 would be 'xxxxx'.
    
    Calling print_histogram_bar("mom", 7) would print:
        mom     : xxxxxxx
    """
    print(f"{word : <8}: {'x' * count}")

def load_words_from_file(filepath):
    """
    Loads words from a file into a list and returns it.
    We assume the file to have one word per line.
    Returns a list of strings. You should not modify this
    function.
    """
    words = []
    with open(filepath, 'r') as file_reader:
        for line in file_reader.readlines():
            cleaned_line = line.strip()
            if cleaned_line != '':
                words.append(cleaned_line)
    
    return words


if __name__ == '__main__':
    main()

There you have it!