The Bayesian self-organizing map (BSOM) is a method for estimating a probability distribution generating data points on the basis of a Bayesian stochastic model. It is also regarded as a learning method for a kind of neural network. The black dots in the below figure denote artificially generated data points. The blue circles denote the multiple centroids of a BSOM model, which are parameters to specify the configuration of distributions. Initially the centroids are positioned randomly. The blue links between the centroids represent a predetermined topology of the model, which gives a constraint into the estimation of the parameters. In the Bayesian framework, such a constraint is expressed as a prior probability for the parameters and used for the stabilization of the estimation. In the present simulation, a line-segment topology is used.
This applet searches for the maximum a posteriori (MAP) estimates of the centroid parameters using an expectation-maximization (EM) algorithm. You can start this algorithm by pressing the `learn' button. You can also initialize the centroids randomly by the `init' button.
The BSOM model has a pair of hyperparameters: alpha and beta, which represent `the strength of topological constraint' and `the estimate of noise level in data' respectively. You can vary them using the sliders. Observe the variation of the centroid configuration according to the values of the hyperparameters and grasp their meaning. Then try to find the optimal values of the hyperparameters giving the best centroid configuration. Remark that the configuration depends on not only the present values of hyperparameters but also their history. Poor moving of the hyperparameters will lead to a poor local optimal configuration. Actually the BSOM has an ability to search for the optimal values of the hyperparameters by itself. This ability is made active by pressing the `auto' button.
You can vary the distribution of artificial data using the sliders named `width', `height', `phase' and `noise level'. You can also vary the number of the centroids by entering the number in #unit and pressing the return key.
(1) The values displayed over the sliders are all relative. The sliders for hyperparameters are on log scales.
(2) You should start learning from a high value of alpha and a low value of beta, otherwise the BSOM will fall into an entangled configuration. When it falls into an entangled configuration, you can make the configuration simple by increasing alpha or decreasing beta. Alpha and 1/beta correspond to the temperature of physical systems. Strategy for avoiding poor local-optimum traps by slowly decreasing temperature from a high-temperature state is called simulated annealing.
(3) Automatic hyperparameter search may fail if alpha is too large or beta is too small at the start. In such a case, a little decreasing of alpha or increasing beta may lead to good search. However, when the noise level is too large, the BSOM gives up trying to detect a signal and makes its configuration simplest (i.e., a straight line segment).
(4) You can move a centroid directly by mouse dragging.
(5) By pressing `density' button, estimated density is displayed using the gray scale.
When alpha is fixed to an infinitely large value, BSOM is similar to principal component analysis (PCA). On the other hand, when alpha is fixed to zero (i.e., topological constraint is ignored) BSOM is regarded as clustering analysis based on a spherical Gaussian mixture model. Moreover, when beta is infinitely large, BSOM is almost same as the k-means algorithm, vector quantization (VQ) and competitive learning. Thus, BSOM is regarded as an intermediate method between PCA and clustering. The elastic net is also an estimation algorithm for BSOM by the gradient ascent method, though I used an EM algorithm here.