Boltzmann machine is much like a spin glass model in physics. In short words, Boltzmann machine is a machine that has nodes that can take values, and the nodes are connected through some weight. It is just like any other neual nets but with complications and theoretical implications.

To obtain a good understanding of Boltzmann machine for a physicist, we begin with Ising model. We construct a system of neurons which can take values of 1 or -1, where each pair of them and is connected by weight .

This is described as a Boltzmann machine, or spin glass in physics. Spin glass is a type of material that is a composite of many spins pointing in different directions. In principle spin glass is hard to calculate.

Neverthless we can make simplifications to this model. We require each spin to be connected to its nearest neighbours only. Such a model is called Ising model.

Intuitively, those spins can be viewed as tiny magnets that can point up or down only. Each spin interacts with its neighbours. These interactions are calculated in terms of energy,

Why do we care about energy? For a physics system, low energy means stable while high energy means unsatble since it might automatically change its configuration into low energy state. That being said, a system of spins is stable if the energy of all the interactions is low.

To find out a low energy state, one of the numerical methods is Monte Carlo method.

States

We have been talking about the word state without being specifying the definition of it. In fact we can think of two different pictures of states. For the purpose of this discussion, we consider a system of particles and each of the particle has degrees of freedom.

The first strategy is to set up a dimension space and describe the state of the whole system with on point in such a space. The distribution of the points can be determined by the corresponding categories of distribution functions. This is dubbed as space.

The second strategy is to use a space of dimensions where each particle of the system is a point in such a space. Such a space is called space. In space, the distribution of each particle state is calculated using BBGKY chain.

Once the macroscopic propertities of the system is assigned, the all possible states that leads to this macroscopic state show up with equal probability, aka, principle of **equal a priori probabilities**.

Partition Function

Partition function is useful as we calculate the statistical properties of the network,

With partition function defined, the distribution of states is

In many cases, we would like the machine to learn about the pattern of input. Probabilistically speaking, we are working out the probability distribution of the input data using a Boltzmann machine,

Or for reasons of log-likelihood, we use the ration of them

In terms of Boltzmann machine weights, which are to be determined, the log probability is proportional to the energy

Since we are looking for the weights, the updating rule should be equivalent to the gradient

We could update our weights using a rule compatible with the gradient of the probability.

It’s easily noticed that the first term is basically Hebbian learning rule, where similar activities enhences the weight. The second term is the some unlearning rule where we have to reduce some weights to relax to the actual working network. Simply put, we kill some connects that have negative effects on our learning network.

Hebbian Learning Rule

Simply put, neurons act similarly at the same time would be more likely to be connected.

A energy minimization procedure would be the same as Hebbian learning rule. Suppose we pick out two spins, and , the connected weight would be positive in order to have lower energy . For spins with different signs, negative weight would be the choice to make sure the energy is lower. This is similar to Hebbian learning rule.

To code a Boltzmann machine, we need a protocal.