Namepy step 6 – Load the data into the database

by coen · June 17, 2016

(This is part of the namepy project. Start at Namepy – on the shoulders of giants)

We will need the following data in the database:

Name frequencies – baby names by year
Scrabble™ letter values, by (country) Scrabble™ set

Name frequencies

Download the data from https://www.ssa.gov/oact/babynames/names.zip
Unzip it in <project_root>/raw_data/yob1880.txt, etc
You may want to add raw_data to .gitignore, so it doesn’t get stored in your git repo

Create some code to read files and store in PostgreSQL – read_name_frequencies.py

def read_frequencies_from_file(filename, names):
print(filename)
year = int(filename[3:7])

year_frequencies = {}
for name in names:
year_frequencies[name] = {'F': 0, 'M': 0}

with open('raw_data/{d34bf16ac7b745ad0d2811187511ec8954163ba9b5dbe9639d7e21cc4b3adbdb}s' {d34bf16ac7b745ad0d2811187511ec8954163ba9b5dbe9639d7e21cc4b3adbdb} filename) as file:
for line in file.readlines():
try:
name_text, sex, count = line.split(",")
except:
print("Couldn't parse line")
print(line)
print
continue

if name_text not in names:
name = Name(name=name_text)
db.session.add(name)
db.session.commit()
names[name_text] = name.id
year_frequencies[name_text] = {'F': 0, 'M': 0}

year_frequencies[name_text][sex] = int(count)

for name, name_frequency in year_frequencies.iteritems():
if name_frequency['F'] + name_frequency['M']:
name_id = names[name]
frequency_record = NameFrequency(name_id=name_id,
year=year,
boys_count=name_frequency['M'],
girls_count=name_frequency['F'])
db.session.add(frequency_record)
db.session.commit()

def read_name_frequencies():
db.create_all()

# Start with an empty list
print("Deleting any previous data")
db.session.query(NameFrequency).delete()
db.session.query(Name).delete()
db.session.commit()
print("Done")

names = {}

# Get file list
for filename in listdir('raw_data'):
if filename[:3] == 'yob':
read_frequencies_from_file(filename, names)

Run the code. Note that this may take a while to run. On my development machine it took about 8 minutes
Check this in the database, for instance with phpPgAdmin or pgAdmin

Scrabble™ letter values

For a list of Scrabble™ letter values by Scrabble™ set see this Wikipedia entry. The following code will grab this page, extract the letter values and save this in the database

Add the new tables to models.py

class Set(db.Model):
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String())
scores = db.relationship('LetterScore', backref='set', lazy='dynamic')

class LetterScore(db.Model):
id = db.Column(db.Integer, primary_key=True)
set_id = db.Column(db.Integer, db.ForeignKey('set.id'))
score = db.Column(db.Integer)
letter = db.Column(db.String(1))

Write some code to parse this page and store the results in the database. See the source code in my GitHub repo
Run the code
Check the results in the database. See above (end of name frequencies section) for some suggested tools

Done

Source code at https://github.com/CoachCoen/namepy/tree/step6
There are no changes to the front end, so there is no working webpage for this step

Time to pull it all together and show some real charts

Continue to Step 7 – Bringing it all together

Namepy step 6 – Load the data into the database

You may also like...

Categories

Recent Posts

Namepy step 6 – Load the data into the database

Name frequencies

Scrabble™ letter values

Done

Next

You may also like...

Tips for writing better Python code

DevOps – intro

Restarting CherryPy on WebFaction

Categories

Recent Posts