By Joel Pendleton
This is a tutorial using Python on how to create a basic neural network to generate a function to perform the XNOR operation.
An XNOR gate (sometimes referred to by its extended name, Exclusive NOR gate) is a digital logic gate with two or more inputs and one output that performs logical equality. The output of an XNOR gate is true when all of its inputs are true or when all of its inputs are false. If some of its inputs are true and others are false, then the output of the XNOR gate is false.
This is the architecture of the neural network used in this tutorial:
The activation function used in this network is the sigmoid function: $$ g(x) = \frac{\mathrm{1} }{\mathrm{1} + e^{-x} } $$
The derivative of the sigmoid function, $ g^{'}(x) $ can be to shown to be $$ g^{'}(x) = g(x)(1 - g(x)) $$
If you're wondering why you would want to use the sigmoid function, I'd recommend taking a look at these resources:
The cost function used is
$$J(\theta) = \frac 1m \sum_{i=1}^m\sum_{k=1}^k\begin{bmatrix}-y_k^{(i)}\log((h_\theta(x^{(i)}))_k) - (1-y_k^{(i)})\log(1-(h_\theta(x^{(i)}))_k)\end{bmatrix} + \frac{\lambda}{2m} \sum_{l=1}^{L-1}\sum_{i=1}^{s_l}\sum_{j=1}^{s_l + 1}(\theta^{(l)}_{ij})^2$$
The cost function tells you how well your model fits the provided data. The smaller the cost the better your model. Therefore, we want to minimise the cost. We do this by varying our parameters in a manner that reduces the cost. This is accomplished by the back propagation algorithm.
import matplotlib.pyplot as plt
import numpy as np
def sigmoid(z):
return 1 / (1 + np.exp(-z))
def sigmoid_derivative(z):
return sigmoid(z) * (1 - sigmoid(z))
#These functions are used in the later code.
x = np.arange(-10., 10., 0.2)
plt.plot(x,sigmoid(x), '-k', label="Sigmoid")
plt.legend(loc='upper left')
plt.xlabel("z")
plt.ylabel("g(z)")
plt.show()
Algorithm for our 3 layer neural network:
Forward propagation is used to find what our model predicts for each training example. This will tell us how we should tune our weights ($\theta$) to make our model more accurate. This tuning of weights is done using backpropagation.
You can think of $\delta^{(n)}$ as the error in our model at layer n.
$ \lambda$ is the regularization parameter. In our code example $\lambda = 0$
Using our cost function, we can write:
$$ \frac{\partial}{\partial\theta^{(l)}_{ij}} J(\theta) = D^{(l)}_{ij}$$
Therefore, $ D^{(l)}_{ij}$ can be used in an optimisation algorithm to minimise the cost function. In our code, we have done this using the gradient descent algorithm. For more on gradient descent, I'd recommend reading https://ml-cheatsheet.readthedocs.io/en/latest/gradient_descent.html
This algorithm above is the general algorithm, we will make it specific to our case by setting $L = 3$, $m = 4$, and $n = 1, 2$.
$$ \theta^{(1)}, \Delta^{(1)}, D^{(1)} \in \mathbb{R}^{2 x 3} $$$$ \theta^{(2)}, \Delta^{(2)}, D^{(2)} \in \mathbb{R}^{1 x 3} $$N.B. $\theta^{(1)}$ and $\theta^{(2)}$ are weights of layer 1 and layer 2, respectively.
The cost function $J(\theta)$ is non-convex so you may get a different minimum value and 'optimised' parameters based on what you initialise $\theta$ as.
Here I have made an array of possible values of $x_1$ and $x_2$ and their corresponding outputs and put them into numpy arrays.
X = np.array([[0,0],
[0,1],
[1,0],
[1,1]])
y = np.array([[1],
[0],
[0],
[1]])
Using Python's OOP features I have created a Neural Network class.
class NeuralNetwork:
def __init__(self, X, y):
self.X = np.c_[np.ones((X.shape[0], 1)), X] #Training inputs
self.y = y # Training outputs
self.numberOfExamples = y.shape[0] # Number of training examples
self.w_1 = (np.random.rand(2, 3) - 1) / 2 # Initialise weight matrix for layer 1
self.w_2 = (np.random.rand(1, 3) - 1) / 2 # Initialise weight matrix for layer 2
# Error in each layer
self.sigma2 = np.zeros((2,1))
self.sigma3 = np.zeros((3,1))
self.predictions = np.zeros((4,1))
# There is 2 input units in layer 1 and 2, and 1 output unit, excluding bias units.
def feedforward(self, x):
self.a_1 = x # vector training example (layer 1 input)
self.z_2 = self.w_1 @ self.a_1
self.a_2 = sigmoid(self.z_2)
self.a_2 = np.vstack(([1], self.a_2)) # Add bias unit to a_2 for next layer computation
self.z_3 = self.w_2 @ self.a_2
self.a_3 = sigmoid(self.z_3) # Output
return self.a_3
def backprop(self):
# These are temporary variables used to compute self.D_1 and self.D_2
self.d_1 = np.zeros(self.w_1.shape)
self.d_2 = np.zeros(self.w_2.shape)
# These layers store the derivate of the cost with respect to the weights in each layer
self.D_1 = np.zeros(self.w_1.shape)
self.D_2 = np.zeros(self.w_2.shape)
for i in range(0,self.numberOfExamples):
self.feedforward(np.reshape(self.X[i, :], ((-1,1))))
self.predictions[i,0] = self.a_3
self.sigma3 = self.a_3 - y[i] #Calculate 'error' in layer 3
self.sigma2 = (self.w_2.T @ self.sigma3) * np.vstack(([0],sigmoid_derivative(self.z_2))) #Calculate 'error' in layer 2
'''We want the error for only 2 units, not for the bias unit.
However, in order to use the vectorised implementation we need the sigmoid derivative to be a 3 dimensional vector, so I added 0 as an element to the derivative.
This has no effect on the element-wise multiplication.'''
self.sigma2 = np.delete(self.sigma2, 0) # Remove error associated to +1 bias unit as it has no error / output
self.sigma2 = np.reshape(self.sigma2, (-1, 1))
# Adjust the temporary variables used to compute gradient of J
self.d_2 += self.sigma3 @ (self.a_2.T)
self.d_1 += self.sigma2 @ (self.a_1.T)
# Partial derivatives of cost function
self.D_2 = (1/self.numberOfExamples) * self.d_2
self.D_1 = (1/self.numberOfExamples) * self.d_1
def probs(self, X): #Function to generate the probabilites based on matrix of inputs
probabilities = np.zeros((X.shape[0], 1))
for i in range(0, X.shape[0]):
test = np.reshape(X[i,:], (-1,1))
test = np.vstack(([1], test))
probabilities[i, 0] = self.feedforward(test)
return probabilities
# Neural network object
nn = NeuralNetwork(X,y)
alpha = 1 # Learning Rate
for i in range(0, 2000): #Perform gradient descent
nn.backprop()
# Update weights
nn.w_1 += - alpha * nn.D_1
nn.w_2 += - alpha * nn.D_2
The code below is just the plot of the contours of the output. Although the inputs in this case are discrete variables, I thought it would be interesting to visualise the decision boundary.
xx, yy = np.mgrid[-0.1:1.1:0.1, -0.1:1.1:0.1]
grid = np.c_[xx.ravel(), yy.ravel()]
# Find the probabilities for each combination of features
probs = nn.probs(grid).reshape(xx.shape)
f, ax = plt.subplots(figsize=(12, 10))
# Create contour lines for each set of probabilities
contour = ax.contourf(xx, yy, probs, 25, cmap="RdBu", vmin=0, vmax=1)
plt.title("x$_1$ XNOR x$_2$")
ax_c = f.colorbar(contour)
ax_c.set_label("$P(y = 1 | X)$")
ax_c.set_ticks([0, .25, .5, .75, 1])
# Plot training examples on figure
ax.scatter(X[:,0], X[:, 1], c=y[:,0], s=50,
cmap="RdBu", vmin=-.2, vmax=1.2,
edgecolor="white", linewidth=1)
ax.set(aspect="equal",
xlabel="x$_1$", ylabel="x$_2$")
plt.show()
This notebook was created in an attempt to consolidate what I've learned from Andrew Ng's Machine Learning course: https://www.coursera.org/learn/machine-learning
Moreover, I'd highly recommend the following resources for learning more about neural networks:
Thanks for reading :)