main.py | graph.py | tensor.py | ops.py | session.py
This literate programming exercise will construct a simple 2-layer feed-forward neural network to compute the exclusive or, using symbolic differentiation to compute the gradients automatically. In total, about 500 lines of code, including comments. The only functional dependency is numpy. I highly recommend reading Chris Olah's Calculus on Computational Graphs: Backpropagation for more background on what this code is doing.
The XOR task is convenient for a number of reasons: it's very fast to compute; it is not linearly separable thus requiring at least two layers and making the gradient calculation more interesting; it doesn't require more complicated matrix-matrix features such as broadcasting.
(I'm also working on a more involved example for MNIST but as soon as I added support for matrix shapes and broadcasting the code ballooned by 5x and it was no longer a simple example.)
Let's start by going over the architecture. We're going to use four main components:
Graph
, composed of Tensor
nodes and Op
nodes that
together represent the computation we want to differentiate.Tensor
represents a value in the graph. Tensors keep a
reference to the operation that produced it, if any.BaseOp
represents a computation to perform and its
differentiable components. Operations hold references to their input
tensors and an output tensor.Session
is used to evaluate tensors in the graph.Note the return from a graph operation is actually a tensor, representing the output of the operation.
from __future__ import absolute_import
from __future__ import print_function
from __future__ import division
import numpy as np
np.random.seed(67)
from tqdm import trange
from graph import Graph
from session import Session
The main method performs some setup then trains the model, displaying the current loss along the way.
def main():
Define a new graph
graph = Graph()
Initialize the training data (XOR truth table)
X = graph.tensor(np.array([[0, 0], [0, 1], [1, 0], [1, 1]]))
y = graph.tensor(np.array([[0, 1, 1, 0]]))
Initialize the model's parameters (weights for each layer)
weights0 = graph.tensor(np.random.normal(size=(2, 4)))
weights1 = graph.tensor(np.random.normal(size=(4, 1)))
Define the model's activations
activations0 = graph.sigmoid(graph.dot(X, weights0))
activations1 = graph.sigmoid(graph.dot(activations0, weights1))
Define operation for computing the loss (mean squared error)
loss_op = graph.mean(graph.square(graph.transpose(y) - activations1))
Define operations for the gradients w.r.t. the loss and an update operation to apply the gradients to the model's parameters.
parameters = [weights0, weights1]
gradients = graph.gradients(loss_op, parameters)
update_op = graph.group([
graph.assign(param, param - grad) \
for param, grad in zip(parameters, gradients)
])
Begin training... We iterate for a number of epochs, calling the session run method each time to compute the update operation and the current loss. The progress bar's description is updated to display the loss.
sess = Session(graph)
with trange(10000) as pbar_epoch:
for _ in pbar_epoch:
_, loss = sess.run([update_op, loss_op])
pbar_epoch.set_description('loss: {:.8f}'.format(loss))
if __name__ == '__main__':
main()