作者: Miroslav Valan , Johan Nylander , Fredrik Ronquist
DOI:
关键词:
摘要: Computer vision has made dramatic progress in recent years on image classification tasks. There are now numerous success stories, where supervised learning of convolutional neural networks using large training sets has resulted in impressive classification performance. Nevertheless, these systems occasionally make grave errors that humans would never make. One reason for this may bethat the training algorithms are over-simplified. They typically use binary scores, 1 for the correct target and 0 otherwise. Recently it has been shown that label smoothing, that is, distributing a small portion of the score equally on all “wrong” categories, can improve performance in some cases. Both of these methods assume that image categories are equally distant, but we often have backgroundknowledge about the similarity relations between them, which can be useful in developing more sophisticated methods. Here, we explore the utility of phylogenetic information in training and evaluating CNNs for identification of biological species. Specifically we propose label smoothing based on taxonomic information (taxonomiclabel smoothing) or distances between species in a reference phylogeny (phylogenetic label smoothing). Using two empirical examples (38,000 images of 83 species of snakes, and 2,600 images of 153 species of butterflies and moths), we show that networks trained with phylogenetic information perform at least as well on common performance metrics as standard systems, while making errors that are more acceptable to humans and less wrong in an objective biological sense. We argue that this is likely to make the systems more robust …