What is this machine learning thing, anyway?

by George Brocklehurst (@georgebrock)

I work for thoughtbot

Presented at PyTexas, and North Bay Python,

This talk will describe the big ideas behind machine learning: what it can do for us, and how it works.

We'll avoid two of the traps that machine learning introductions often fall into.

We'll avoid focusing on implementation, at the expense of understanding how things work; in other words, we'll try not to make this sound too much like magic.

from magic import wand

wand.wave()
Code examples without context can seem like magic

We'll also avoid getting so far into the details of how things are implemented that we obscure simple ideas with complex mathematical notation.

Source: “Linear regression” on Wikipedia

What can machine learning do for us?

Let's look at a problem I recently solved using machine learning, so we can see where it's useful.

I wanted to parse descriptions of ingredients in recipes to extract the quantity, unit of measure, and ingredient name.

# Input
"2 tablespoons butter"

# Output
{
    "quantity": 2,
    "unit": "tablespoons",
    "name": "butter",
}
Example input and output for the ingredient parsing problem

At first this seemed simple, but the more examples I looked at the more complex the problem became. In the end I gave up trying to use basic string parsing. While it was easy for me to look at an ingredient string and see the answer, it was hard to determine what the rules were to get a computer to do that.

[
  "2 tablespoons butter",
  "2 cups all purpose flour",
  "A pinch of salt",
  "1/2 cup whole almonds (3 oz), toasted",
  "confectioners' sugar for dusting the cake",
  # ...
]
More examples of ingredients

Generalisation

This kind of problem is ideal for machine learning, because the killer feature of machine learning is generalisation. If a system can generalise, it can work with examples that weren't explicitly considered by the design.

As an aside, you can read more about how I solved this in practice in my article on Named Entity Recognition on the thoughtbot blog.

Machine learning sounds hard

Think about what you do when you write a typical program: you consider all the possilbe types of input, and write down rules for the computer to follow.

If there are too many possibilities to consider all of them, or if our understanding of the rules is too vague to write them down precisely, we can't follow this typical approach.

Fortunately, you've probably built generalising systems before, even if you didn't realise you were doing it.

Remember high school science class?

If your high school was anything like mine, you did a lot of experiments in science class. We're going to look at a simple experiment here, and use it as an analogy for how machine learning works.

One popular experiment is to measure how high a ball bounces when it is dropped from different heights.

The aim of the experiment is to discover if there's a mathematical relationship between the height of the drop and the height of the bounce. If we discover such a relationship, we'll be able to use it to predict the height of future bounces.

Step 1: Collect data

The first step is to collect data. We have to drop a ball a few times from different heights and record the heights of the bounces.

A chart showing the height of a ball's first bounce when it is dropped from different heights.

Step 2: Build a mathematical model

Once we have plotted our data, we can see a clear trend: the points on our chart are arranged in roughly a straight line. We can add a trend line to our chart to indicate the relationship between drop height and bounce height.

A line of best fit on a scatter plot

When we decide where to draw a straight trend line, we're really chosing two values:

Once we have these two values, we can calculate the any point on our line with some simple Python code:

gradient = 1.0
intercept = 0.0

def predict_bounce_height(drop_height):
    return intercept + (drop_height * gradient)
Python code to predict the bounce height of a ball

This line is a mathematical model of how a ball bounces. It can make predictions about how a ball will bounce, even when the drop height isn't one of the ones we measured. In other words, we've built a generalising system.

Does our model fit our observations?

So far, we picked the gradient and intercept of the line by eye, plotting whichever trend line looks right to us, but we might not get the best possible result. To make sure we're getting the best result, we need some measure of how well our trend line fits the values we measured in our experiment.

The measure that's typically used is the average squared error, which is calculated like this:

  1. Measure the vertical distance between the line and each point.
  2. Square each distance, to make sure the values are all positive.
  3. Take the mean average.

We can visualise it like this:

Error measurements

As we've already seen, we can use our line to predict a bounce height based on a drop height. We can build on that to write some code to calculate the error:

import math

measurements = [
    (1, 0.72), (2, 1.48), (3, 2.26), # etc.
]

def cost():
    errors = [
        predict_bounce_height(drop_height) - measured_bounce_height
        for drop_height, measured_bounce_height
        in measurements
    ]
    return sum([math.pow(err, 2) for err in errors]) / len(measurements)

Finding the best possible line

Now that we can attach a number to a line to tell us how good or bad it is, we can attempt to find the best possible line.

If we plot a chart of the error against the gradient, we can see a pattern: the error is lowest at a single specific point.

A chart showing how the error changes when we change the gradient

There are lots of algorithms available to find the minimum value of a function. For example, imagine an algorithm that makes small changes to the gradient, and iteratively gets closer to the best result.

Machine learning systems often use this type of algorithm to find best parameters to fit a model to our data set.

We train a model by finding values for the parameters so the model agrees with our observations.

Does the model generalise?

The killer feature we were aiming for was generalisation. So how do we know if our model generalises well? We've seen it can make predictions, but are they right?

This is easy to test: we can collect more data, data we didn't use to develop our model, and check if the predictions are close to what we observe in the real world. We can even re-use our error score calculation to see how well the predictions fit our test data.

So, machine learning?

The steps in building a machine learning system are the same.

  1. Collect examples
  2. Choose a model
  3. Train a model
  4. Test the model
  5. Make predictions

Our data was very simple&emdash;we only had one input variable, and one output variable. We could look at the data on a scatter plot and it was clear we should pick a straight line as our model. In real world machine learning systems, it's rarely that simple. We may have to try several different models before we find one that fits the data well, and generalises well to new examples.

What’s next?

If you're looking for more information, I'd recommend this book for a good overview of different types of models. While it is somewhat more mathematical than this talk, each equation is accompanied by a clear description and a worked example.

Cover of 'Fundamentals of Machine Learning for Predictive Analytics'
Fundamentals of Machine Learning for Predictive Analytics by Kelleher, Mac Namee, and D'Arcy.

For a more hands-on approach, check out the online course from fast.ai.

Any questions?

Ask now, or later: @georgebrock on Twitter or email george@thoughtbot.com

I work for thoughtbot

These slides: georgebrock.github.io/talks/what-is-ml