作者: Michael C. Mozer , Brett D. Roads , Maria Attarian
DOI:
关键词:
摘要: Deep-learning vision models have shown intriguing similarities and differences with respect to human vision. We investigate how bring machine visual representations into better alignment representations. Human are often inferred from behavioral evidence such as the selection of an image most similar a query image. find that appropriate linear transformations deep embeddings, we can improve prediction binary choice on data set bird images 72% at baseline 89%. hypothesized embeddings redundant, high (4096) dimensional representations; however, reducing rank these results in loss explanatory power. dilation transformation explored past research is too restrictive, indeed found model power be significantly improved more expressive transform. Most surprising exciting, that, consistent classic psychological literature, similarity judgments asymmetric: X Y not necessarily equal X, allowing express this asymmetry improves