Simulated Annealing and the Boltzmann Machine
Simulated annealing
Annealing is a process used in metallurgy for tempering certain alloys of metals by heating them to a high temperature. The molten metal is then cooled gradually to bring the temperature down to room temperature.
Simulated annealing refers to the annealing process done on a computer by simulation. In this model, a parameter T, equivalent to temperature in annealing, is reduced slowly. In the context of ANN learning, it is a technique used for reducing the possibility of the net falling into a local minimum during the training of a neural network and assist the finding of the global minimum.
The particular ANN paradigm, for which simulated annealing is used for finding the weights, is known as a Boltzmann neural network, also known as the Boltzmann machine (BM). The BM, proposed by (Ackley et al., 1985), is a variant of the Hopfield net with a probabilistic, rather than deterministic, weight update rule.
Imagine a marble rolling around in a landscape, which represents the weight or energy space. The avoidance of a local minimum A and finding the global minimum B (say) may be accomplished by at first vigorously shaking the landscape to enable the marble to climb out of a local minimum across an energy barrier. To help the marble escape local minima, and at the same time increase its chances of remaining in the global minimum, the shaking is done less and less vigorously, which is analogous to starting with a high value of T and reducing it gradually. Here T represents the “vigour” of shaking.
In terms of architecture, the BM consists of input, output, and intermediate or hidden nodes. All weights are symmetric weights where
wij = wji, for i¹j.
Thus Boltzmann networks are highly recurrent, and this recurrence eliminates any basic difference between input and output nodes, which may be considered as either inputs or outputs as convenient.
Node outputs in a BM take on discrete {1,0} values. Outputs are computed probabilistically, and depend upon the temperature variable T. Activation, Si, of node i is given by
Si =
The new value of the node’s output ui is given by
ui = (1)
where,
pi = (2)
The new value of ui will be either 1 or 0 with equal probability 0.5 if either Si = 0 or T is very large. On the other hand, if T is very small, the new activation is computed deterministically, and the node behaves like a familiar discrete {1,0} cell.
Input and output nodes can be either clamped or free-running. If they are clamped, then their outputs, ui, are fixed and do not change. If they are free-running, then their outputs change according to the equation for ui above. Intermediate nodes are always free-running.
The BM begins with a high temperature T and follows a simulated annealing schedule where T is gradually reduced as iterations proceed. At each iteration, we select a node ui at random and, if it is not clamped, compute ui’s new output. When T becomes sufficiently small, the net becomes deterministic like a Hopfield net and reaches equilibrium. At this stage, no further output changes occur, and the node outputs are read off.
The energy of the BM is computed using the same formula used in Hopfield nets:
E(t) = -
With T = 0 or close to 0, in Eq.(2) above, the net behaves as a Hopfield net, and its energy either decreases (for a change in ui from 1 to 0), or remains unchanged (for a change in ui from 0 to 1). Hence the system must stabilise by moving into a minimum energy state if T = 0. To ensure that a global minimum will be reached with a high probability, simulated annealing can be performed by lowering T gradually.
The network can settle into one of a large number of energy states, the distribution of which is given by the Boltzmann distribution function
p = ke-E/T
If P is the probability of a state with energy E , then
P /P = ( e-E/T)/( e-E/T)
= e-(E-E)/T
If E is a lower energy state than E then
e-(E-E)/T > 1
therefore P /P > 1, so P > P
which means as the network approaches thermal equilibrium, lower energy states are more probable, dependent only upon their relative energy.
Higher temperatures allow local minima states to be escaped via higher energy states, but also allow transitions from lower to higher minima with almost equal probability. As the temperature is lowered, however, the probability of escaping from a higher minima to a lower one falls, but the probability of travelling in the opposite direction falls even faster, and so more low energy states are reached. Eventually, the system settles down at low temperature in thermal equilibrium.
Learning in Boltzmann Machines
The network is arbitrarily divided into input, output and hidden units.
Learning occurs in two phases.
Phase 1: Input and output units are clamped to their correct values, and the temperature set to some arbitrary high value. The net is then allowed to cycle through its states with the temperature being gradually lowered until the hidden units reach thermal equilibrium. Weights between pairs of "on" units are incremented.
wij = wij + d
if ui = 1 and uj = 1
d = k(pij – p’ij)
where pij is the average probability of the two neurons i and j being on when the network is clamped, and p’ij is the probability of them being on when the network is not clamped, and k is a scaling factor (Ackley et al., 1985).
Phase 2: Only the input units are clamped to their correct values, with hidden and output units left free. The net is run until thermal equilibrium is reached. Weights between on units are decremented.
wij = wij - d
if ui = 1 and uj = 1
The first phase reinforces connections that lead from input to the output, while the second "unlearns" poor associations.
There are a number of problems associated with Boltzmann machines. Decisions about how much to adjust the weights, how long you have to collect the statistics to calculate the probabilities, how many weights you change at a time, how you adjust the temperature during simulated annealing, and how to decide when the network has reached thermal equilibrium. The main drawback however is that Boltzmann learning is significantly slower than backpropagation.
References:
Beale & Jackson, Neural Computing, Adam Hilger 1990.
Ackley, D.H., Hinton, G.E. and Sejnowski, T.J. (1985) A learning algorithm for Boltzmann machines, Cognitive Science, 9, 147-69.
Source: http://ftp.it.murdoch.edu.au/units/ICT482/Notes/Simulated%20Annealing%20and%20the%20Boltzmann%20Machine.doc
Web site to visit: http://ftp.it.murdoch.edu.au
Author of the text: indicated on the source document of the above text
If you are the author of the text above and you not agree to share your knowledge for teaching, research, scholarship (for fair use as indicated in the United States copyrigh low) please send us an e-mail and we will remove your text quickly. Fair use is a limitation and exception to the exclusive right granted by copyright law to the author of a creative work. In United States copyright law, fair use is a doctrine that permits limited use of copyrighted material without acquiring permission from the rights holders. Examples of fair use include commentary, search engines, criticism, news reporting, research, teaching, library archiving and scholarship. It provides for the legal, unlicensed citation or incorporation of copyrighted material in another author's work under a four-factor balancing test. (source: http://en.wikipedia.org/wiki/Fair_use)
The information of medicine and health contained in the site are of a general nature and purpose which is purely informative and for this reason may not replace in any case, the council of a doctor or a qualified entity legally to the profession.
The texts are the property of their respective authors and we thank them for giving us the opportunity to share for free to students, teachers and users of the Web their texts will used only for illustrative educational and scientific purposes only.
All the information in our site are given for nonprofit educational purposes