Restricted Boltzmann Machines (RBMs) are stochastic neural networks which are capable of learning a probability distribution over its set of inputs. This characteristic allows them to be useful in many different and complex tasks, the most popular of which are dimensionality reduction, feature learning, classification and collaborative filtering. Nowadays, the RBMs have gained much interest as they are studied in many different versions and scientific fields, using multiple types of data. The use of RBMs in so many scientific fields, has raised our interest in further study and improvement.
The main version of RBM consists of two layers, a visible and a hidden. The visible layer contains the visible nodes corresponding to the input data and the hidden layer, which tries to capture latent dependencies, allowing the correct reconstruction of the initial data. While the RBM is being trained, its parameters are also updated, until a stopping point is reached, where the log likelihood function finds a maximum and the model parameters are optimal. Computing
analytically the log likelihood function and its gradient in each training step is intractable, therefore the Contrastive Divergence (CD) algorithm was invented to estimate it numerically. However, the CD algorithm is a biased estimator of the log likelihood gradient, so the derived approximation of the log likelihood function in each training step, using Annealed Importance Sampling technique, may not be a safe choice for evaluating the training process. For this reason, most of the times a maximum iteration limit is imposed to terminate the training procedure.
With all of this in mind, we decided to explore the stopping criteria used in RBMs training. So, this study aims to introduce a new stopping criterion, which is based on the Hamming Distance. In particular, the proposed criterion computes the Hamming Distance between the input data and the reconstructed data from the model, in each training step. This new criterion proposes an optimal stopping point at a time when the rate of change of the Hamming distance becomes insignificant and beyond this point further improvement of the Hamming Distance value will be negligible.
In addition, our criterion was compared with known methods currently used for the termination of the training process, such as log likelihood function estimation and maximum iterations limit. More specifically, simulations have been made where the training process is stopped by either criterion (log likelihood and our proposed stopping criterion) and then the optimal model parameters are stored and the reconstructions of the test set are created to be fed into a softmax classifier. At this point, it should be mentioned that the log likelihood function was estimated based on Annealed Importance Sampling technique, as in real size RBMs the direct computation of log likelihood is intractable, introducing an unbearable computational burden compared to our proposed criterion. The comparison of these two termination methods in terms of stopping epoch, total computational time, classification accuracy, highlighted the advantages and the drawbacks of each method respectively, concluding that our method saves time and has smaller variance.
In addition, simulations have been done where a maximum iteration limit is imposed to terminate the training process. In this case, the classification accuracy in every training step is calculated to evaluate the training progress. surprisingly enough, the classification accuracy does not seem to improve when a large number of training epochs is used, after an initial increase in the early epochs, the curve starts to flatten, at the point of our termination criterion. Finally, we showed by simulations the robustness of our criterion as compared to the log likelihood stopping criterion, when modifying the model hyperparameters. The log likelihood criterion, when using a small learning rate, appears to have difficulties founding the maximum value that terminates the training process.
The evaluation of the stopping criteria was performed using many benchmark datasets, such as MNIST and OCR to ensure that we have come up with consistent results. Based on our study, we believe that the proposed stopping criterion for RBMs training is useful and has potential for further improvement in the future.