作者: Vladimir Sukhoy , Alexander Stoytchev
DOI:
关键词:
摘要: Cross-validation is the gold standard for evaluating machine learning algorithms or finetuning their parameters. The results of this technique, however, are not always reproducible and may depend on the computing platform and the number of parallel threads, especially if the underlying learning algorithm uses a pseudo-random number generator (PRNG). This paper gives a recipe for solving these reproducibility problems and applies it to LIBLINEAR 1, a popular software library that implements randomized learning algorithms based on support vector machines 2. The proposed approach solves these problems by using a cross-platform PRNG and by making the PRNG state private in each thread. The cross-validation results obtained with the modified version of LIBLINEAR are the same across platforms. Furthermore, the parallelized cross-validation results are no longer affected by random fluctuations arising from the sharing of the PRNG state by parallel threads.