Initial Programming Exercises On Simple Neural Networks
- Read the following web-page of this web-site to learn the programming concepts that you need to learn to work with LOCEN (and in general to be able to build and work with computational models of brain and behaviour, or artificial intelligence/robotic systems): web-page on programming concepts to learn
- Depending on the programming language that you have to learn (agreed with your LOCEN supervisor), refer to suitable web-sites teaching such programming language. For the different programming languages, such web-sites are indicated in the web-pages linked from this web-page under ''Programming and simulations --> Specific languags".
- Installing an IDE (Integrated Development Environment) on your portable computer (or the computer at LOCEN) to program in the selected language; indications on this are given in the web-pages dedicated to the specificic languages and mentioned in the previous point
- Understand that there are various learning paradigms related to neural networks (see below for more details): this page gives some exercises related to supervised learning and reinforcement learning
- Do the exercises illustrated in detail below, in particular involving:
- A first simple perceptron neural network able to learn the AND and OR functions through a supervised learning algorithm
- The exercise of the previous point, but using linear algebra data structures and operations of the language you use (or libraries expanding the language)
- Implementation, based on linear algebra, of a multilayer perceptron able to learn the difficult X-OR function
- Implementation of two leaky neurons
- Implementation of an echo-state network
- Implementation of a reinforcement learning model based on Q-Learning
- Implementation of a reinforcement learning model based on an actor-critic (still under construction)
Understand (study) that there are various learning paradigms related to neural networks: see below for more details
Understand that there are various learning paradigms (classes of algorithms) through which neural networks can learn. The major ones are these:
- Supervised learning. Two very important sub-classes:
- Linear regression
- Unsupervised learning
- Reinforcement learning
Search the internet for further information and understand the differences between these different learning paradigms, e.g.:
Also, ask your LOCEN supervisor for the book of Floreano on Neural Networks: an old book but one of the best for learning the basics on neural networks.
Exercise to program a first simple perceptron neural network able to learn the AND and OR functions through a supervised learning algorithm
This section gives a programming execise on a simple neural network and some indications on how to accomplish it.
The goal of the exercise is to program a one-layer perceptron that is able to learn, in a supervised fashion, to produce the logical AND and OR functions. In particualr the network should respond with the following 'desired output' if given the folling 'input' examples:
Input1 Input2 Bias Output
0 0 1 0
0 1 1 1
1 0 1 1
1 1 1 1
Input1 Input2 Bias Output
0 0 1 0
0 1 1 0
1 0 1 0
1 1 1 1
There are various versions of such simple perceptron. You should implement one where the output unit uses a sigmoid transfer function, and the network is considered to output 1 if the actvation of the output unit is > 0.9, and to asnawer 0 if the activation of the output unit is < 0.1. This leads you to create a more general neural net, e.g. useufl when you pass to multi-layer perceptrons (exercise below); plus is more correct/general from a machine learning point of view.
Your networks should have 3 input units (the first being a 'bias') and one output units.
The network should learn on-line (i.e., after the presentation of each example of the training set) not batch (i.e., for all exaples together).
Note that neural networks (usually) learn progressively, rather than by 'one shot', by being exposed to the same training examples several times. This affects the main program structure indicated below.
Your program should be structured as follows:
- Creation and initialisation of data structures
- Creation of data structurs and data for training the neural network
- Loop for multiple epochs (one epoch is one learning pass of the net):
- Loop for examples of data-set
- Newtork spreading
- Network learning
- Collection of data for following data analysis
- Loop for examples of data-set
- Data analysis: plots on learning curve of neural network
- Data analysis: numerical output of network performance (e.g., desired and actual net output, errors, etc.)
Same exercise of the previous point, but using linear algebra data structures and operations of the language
Do the exercise above but:
- Use the programming language-dependent data-structures to implement vectors (e.g., of input and output) and matrixes (e.g., for connection weights)
- Use the programming language-dependent linear algebra operators (e.g., to implement the whole network spreading of activation, you should have something like:
vOutp = mConnWeig * vInpu
- Implement a neural network that performs the AND and the OR functions at the same time. The network will hence have 3 input and two output units.
Exercise to implement, using linear algebra programming, a multilayer perceptron learning the X-OR function
Implement a neural network that performs the X-OR function below. To this purpose you will need a multilayer perceptron with 3 input units, 1 output units, and a variable number of hidden units (initially, implement a network with 1 hidden layer only).
Input1 Input2 Bias Output
0 0 1 1
0 1 1 0
1 0 1 0
1 1 1 1
Parity check function
This is a generalisation of the x-or function, where the output is 0 if the elements of the input pattern have an odd number of digits equal to 1, and 1 if such number is equal. So for example, for 3 input you have:
Input1 Input2 Input3 Bias Output
0 0 0 1 1
0 0 1 1 0
0 1 0 1 0
0 1 1 1 1
1 0 0 1 0
1 0 1 1 1
1 1 0 1 1
1 1 1 1 0
The new network to solve either tasks has a different architecture with respect to the perceptron. In particular, it has a second layer of two units, called "hidden units", between the input layer and the otput layer. So the inputs and the bias fire on the hidden units and then the hidden units (and the bias) fire on the output units.
The learning algorithm for the weights between the output and the hidden units is the same as for the simple perceptron. A second algorithm is needed to modify the weights between the input and the hidden units, called "error backpropagation" algorithm. Search the formulas of the error-back propagation algorithm in internet. Make sure that you understand very well the algorithm as it is not very simple.
Change your program so that after learning it plots:
- The curve of the Square Error of the output with respect to the desired output during learning (put "epochs" at the x-axis, i.e. plot the error averaged over all the input patterns)
- The two 2D weights matrices (from input to hidden; from hidden to output) of your multilayer perceptron: this plot represents every value of a connection weight as a coloured square, with each colour representing the size of the weight, from most negative values to largest positive values.
- Also plot a similar graphs with: on 1 row = the activation of 1 input pattern; the related activation of the hiddent units; the activation of the output unit(s); on different rows: those activations corresponding to all partterns.
Further change the program so that at the end of the simulation all data to make the previous graphs are downloaded into separated file .txt files, and then the graphs are plotted by a second program.
"Leaky neurons" are different form the "firing rate" neurons used so far. In particular, these are neurons that store information in time, e.g. they have an activation that accumulates, with a fixed input, or progressively fades away, with a zero activation. They are indeed behave like a ''container of water with a leak when you put water in it": they slowly fill in when you put water, the progressively get empty when you stop.
First, see the the relevant slides of this neural-network course:
in particular the relevant slides of this lesson:
Use those slides to understand how a leaky units work:
leaky element for computing the activation potential of the neural unit;
tanh function -- truncated to positive values -- for computing the activation of the neural units.
Also understand from the slides (and internet) how to implement the leaky units (or any other dynamical system) in a computer program through the "Euler approximation". Euler approximation is necessay as now we will simulate time more in detail as a continuous flow. So, a neural network will be a dynamic system that "lives in a continous time" and the Euler approximation allows the approximated simulation of its dynamics in the discrete time-steps of the computer program.
To suitably take account of continous time, in the simulations establish these critical variables:
- sSimuTimeDura: duration of the whole simulation, in seconds, e.g. set to 10 seconds.
- sDeltaTime: the durantion of the time step used to approximate continous time, for example set to 0.01 (seconds). For a certain duration of the simulatin, the smaller the time step, the more accurate the approximation of the dynamics of the system. Each cycle of the simulation will now correspond to one sDeltaTime.
- sSimuStepDura: duration of the simulation in time steps, set equal to: sSimuTimeDura/sDeltaTime
Another very important thing to do in the implementations. With continous time, causation between phenomena can happen only in different time steps (from t-1 to t). So, for any variable that affect another variable, e.g. a neuron that affects another neuron (or itself in the future), you have to do this:
- Store the "old value of the variable" and the "current value of the variable"
- At the beginning of the simulation cycle representing the elapsing of time (in sDeltaTime steps), you have to save the old value of all the variables affecting other variabels the next time step as this:
sUnit1Old = sUnit1
sUnit2Old = sUnit2
- Then, for any causal dependency between the variabels you have to consider the old values of the variabels, for example:
sUnit1 = sUnit1Old + (sDeltaTime/Tau) * (-sUnit1Old - 0.5 * sUnit2Old + sInpu1Old)
sUnit2 = sUnit2Old + (sDeltaTime/Tau) * (-sUnit2Old - 0.5 * sUnit1Old + sInpu2Old)
Based on such information, do the exercise below.
A neural network formed by two leaky units
The input is 0 for one second, 1 for one second, 0 for two seconds, 1 for two seconds, 0 for three seconds, 1 for three seconds (12 seconds in total).
Architecture of the network:
- A single input unit (sUnit1), formed by a first leaky neuron, and an output unit, formed by a second leaky neuron.
- Both sUnit1 and sUnit2 take the (external) input
- sUnit2 is connected to sUnit1 with a connection weight of -1
- sUnit 1 activation gives the output of the network
Run a simulation lasting 12 seconds.
Plot both the input and the activation of sUnit1 and sUnit2 in time.
Try to play with the tau values of the two variables, and also with sDeltaTime.
Echo State Network
Simulate the ESN models below by using the same technique used to simulate the leaky neurons (see above), i.e. use:
sSimuStepDura, sSimuTimeDura, sDeltaTime
- with any input, the network will not explode
- if the input units are turned to zero, the network will progressively relax to zero
- for a a given sequence of input patterns, the network will go through a specific transient dynamics: this is the most important property for which the network tends to respond to the current input, and also to the recent ones (with a decaying memory), in a specific deterministic way
- moreover, if the input changes a bit the dinamics will change a bit, rather than a lot as in chaostic systems
Using an existing programming function (search it from the math libraries of your programming language), you can extract from W its eigenvalues (you do not need to know how this is done or what eigenvalues precisely are: roughly speaking, they indicate the size of some important dimentions, the eigenvectors, of the space covered by W column vectors).
- The ESN units dynamically responds to the input in time as usual
- The perceptron reads out the inner activations of the network to reproduce the sinusoid function.
Q-LearningReinforcement learning is an important family of algorithms that allow the simulation of intelligent agents (e.g., simulated animals or intelligent robots) that learn to accomplish a task by trial-and-error while interacting with the world. Q-Learning is one of the two most important and basic reinforcement learning algorithms, the other being the actor-critic model. Q-learning is based on a series of states s (s ∈ S, S = all states), a series of actions a (a ∈ A, A = all actions), and a series of “valuations” q (q ∈ Q, Q = all valuations) for each state-action to pass to a new state.
There are two different ways to implement Q-learning:
- Tabular Q-Learning: This is based on a 2-entry table, not a neural-network implementation and so it is not useful as a biological model. However, we suggest to take a look to this page related to tabular Q-learning because it helps you starting to understand which kind of problems Q-learning can solve and how:
- Neural-network Q-Learning. This is the target of this tutorial. The model is based on a simple “perceptron” neural-network architecture, i.e., a 2-layer feedforward linear network formed by an input and an output layer. The system takes as input the current state of the world and gives as output the q values one for each action: the actions are encoded in a “localistic way” (non “distributed way”), i.e. the system encodes a finite number of discrete actions where 1 action correspond to 1 output unit.
We now explain how to implement neural-network Q-learning in detail.
Simulation of the “world”
At first you have to program the simulation of the world with which the agent interacts. Here we assume a simple “grid world”, i.e. a world formed by "positions", or "cells", where the agent can be at each step, and where the agent can navigate with its actions. You can consider a 1D arena (or "world": this is formed by a number of contiguous cells, or positions, forming a 1D sequence, e.g. imaging a corridor divided in contiguous squares), with two possible actions for moving leftward or rightward along the positions, or a 2D world (a number of contiguous positions organised in terms of a 2D grid), with four possible action actions to move north, est, west, south.In the case of the 1D world, you represent the state of the world, i.e. the position where the agent is, with a vector of 0s and a 1 in correspondence to the position occupied by the agent. In the case of the 2D world, you represent the state of the world with a matrix of zeros and a 1 in correspondence to the position occupied by the agent. We recommend to implement both the 1D and 2D world in parallel, and to parameterise your program to use one or the other at will, because this will be very instructive and help you debugging the program.
Choose a starting position in which you will position the agent at the beginning of each trial: this is represented as the initial “1” in your vector or matrix of zeros representing the state of the world.
Moreover, decide where to put the reward (r) in your world model. To this purpose, use another vector/matrix with the same dimensions of the world verctor/matrix, with a 1 in the element corresponding to the rewarded position.
Importantly, some of the actions that the agent might choose would bring it to hit the ''walls'' of the arena where it navigates. For example, if it is at the top-left corner of the 2D arena, if it selects the action "west" or the action "north", these actions lead the agent to hit respectively the west and north walls of the arena and so it remains where it is. In general, the agent is always left free to select any one of the actions of its repertoire (so at each time all actions of its repertoire of actions are available for selection), but if it selects an action that leads it against a wall, the effect of this action execution is simply that the agent is left in the same position from which it selected and performed the action. This istuation is similarly to the one of a robot that tries to move against a wall and the effect is that it remains where it is.
Activation of the neural network
The agent is based on simple neural network formed by 2 layers of units:
The input layer is activated with the state of the world representing the position of the agent in the world (the input units as as many as the possible positions in the world). In the case of the 2D world, you should “unroll” the world matrix into the vector representing the input units.
- The connection weights represents the Q values.
- At the beginning you can set them to zero values or small random values.
- The output units represent the q values of the possible actions (2 or 4 actions).
The output values do not have to pass through a step or sigmoid transfer functoin, as usual in neural networks, but through a SoftMax function (also known as “normalized exponential function”). The SoftMax function allows the normalisation of the output values so they sum up to 1 while keeping a difference between them. This allows to consider the output of the SoftMax as probability values, used to extract the action to execute. It means that, with zero initial connection weights, at the beginning you will have two outputs with two 0.5 values; if you have four outputs you will have four 0.25 values.The SoftMax function SM[.], allowing the computation of the probabilities p[a] of selecting the different actions a, is defined as follows:
For q ∈ Q and a ∈ A: p(a) = SM(q)=(e^(q/T)) / (Σ_j e^(q_j/T))
T is called the “temperature parameter”, and e is the Euler number. For high temperatures, all output values tends to assume the same probability, instead for low temperatures the values tends to have great differences (to the extreme: one value is close to 1 and all the others values are close to 0). So the temperature allows setting how much the probabilities of selecting the different actions differ between them in correspondence to the Q values.
We now explain the core equation of the algorithm, allowing the update of the network connection weights corresponding to the q values, and then the whole program of the simulation. As we are facing reinforcement learning, the neural network cannot be trained in a supervised learning fashion as we lacks a vector of desired output values. Thus, we use a different learning rule to update step by step the Q values, depending on the quality of the chosen action in terms of reward. At each time step t, the learning rule updates only the connection weights that correspond to the action a[t-1] selected and executed at time t-1 (this action corresponded to the q value: q[s[t-1], a[t-1]]).
To update the connection weights w[j,i] connecting input unit s_i to the output unit q_j corresponding to the selected a[t-1], is as follows:
Delta w_ji = η * ( (r[t] + γ * Max_q[q[s[t], a[t]]]) – q[s[t-1], a[t-1]] ) * s_i[t-1]
Let us examine this equation closely:
- As mentioned, w[j,i] connects input unit s_i to the output unit of the network corresponding to the q-value q_j corresponding to the previously selected a[t-1]
- η is the learning rate, usually set equal to 0.1
- r(t) is the reward achieved in the current state s_tγ is a “discount factor” usually set to 0.9: this has the function to decrease the value now attributed to future q values and rewards
- Max_q[q[s[t], a[t]]] represents the maximum value within the set of 2 or 4 q values corresponding to the 2 or 4 actions
- s(t-1) is the previous states vector, and s_i is an element of it
This is how the simulation will broadly work. In the first trial, you will set your agent in the world and it will start exploring randomly the environment until it finds the reward. When it reaches the reward, the q value of the last action performed increases according to the learning equation. The first trial ends and the second one starts with the agent set again in the initial position. The agent will move again, and will learning nothing until it reaches the position to which it assigned a value in the first trial (the one visited before the reward). Thus, the q-values/weights matrix are progressively updated moving away from the rewarded position. Therefore, the larger the world, the larger the number of trials and steps the agent will need to learn to know what to do at each position.
Here the pseudo-code of how the algorithm works
- Initialise the relevant variables
- Loop over trials
- Reset relevant variables
- Set the world to the initial state (agent at the starting position)
- Loop over the steps of the trial
- Store values (which become the "values at t-1") of critical variables before updating them: input, q-values, action
- Update the state of the world based on the action a[t-1]
- Compute the reward r[t] returned to the agent by the world
- Activate the input units of the network based on s[t] (this represent the activation of sensors)
- Implement the spreading of the net to compute the q[t] values
- Compute the probabiliteis of action, p[a], with the SoftMax function
- Select action a based on p[a]: this action will be executed at the next time step
- If this is not the first or the last step of the trial:
- update the network weights based on the equation above
- If this is the last step of the trial (r=1)
- update the network weights based on the equation above but considering zero instead of Max_q as "there is not future" in the trial
As you can see, regarding the learning rule:
- In the fist step of each trial the agent does not learn: the reason is that the algorithm needs to succeeding time steps to learn as its learning is based on the consequences (in terms of reward) that the action selected at t-1 causes at t
- In the last step of the trial, the algorithm does not consider the maximum current q, corresponding to an estimate of the future rewards, but rather a zero value as the trial ends and so “there is not future” for the agent, hence no future rewards
To see if your program works well, try this simulation:
- Set a 1D world of nine positions (two actions)
- Set the reward on the last position on the right
- Set the start at the first position on the left
- Set a maximum of 200 Trials and a maximum of 50 steps per trial
- Set LearningRate = 0.1 and DiscountFactor = 0.9
- Set a random selection of actions (so not following the SoftMax probabilities, but rather a a 0.5 probability for each action)
- Let it run!
If the program is correct, you should obtain q-values (i.e., the network connection weights) similar to the following ones (horizontal 0-8: positions; vertical 0-1: the two actions, East and West; note how the values of the West action are equal to 0, γ^0=0.9^0=1, γ^1=0.9^1=0.9, γ^2=0.9^2=0.81, etc.):