作者: Alessandra Tosi , Alfredo Vellido
DOI:
关键词:
摘要: Probabilistic Dimensionality Reduction methods can provide a flexible data representation and a more faithful model of the observed Multivariate Datasets. This target is too often reached at the expense of model interpretability, which has an impact in the model visualization results. In many practical applications, an optimum performance could be less relevant than the achievement of interpretability: this is often the case in areas such as Medicine, Biology, Astronomy, Finance and Engineering (to name just a few). In this context, the task of data visualization is central to data exploration [1].In manifold learning, when a high-dimensional space is mapped onto a lower-dimensional one, the obtained embedded manifold is subject to some local geometrical distortion induced by the non-linear mapping (manifold compression, stretching, gluing and tearing). This kind of distortion can often lead to misinterpretations of the data set itself. But, given that it is almost impossible to completely avoid geometrical distortions while reducing dimensionality, it is important to give relevance to another aspect of the problem: how to interpret the geometry and the local metric of the model in order to explore the data in a more faithful way. We consider here an explicit way to compute local metrics in generative models who perform probabilistic dimensionality reduction. The obtained metric tensor is here used to compute geodesic distances over the latent space using a graph-based dicretisation of the latent space itself. This way, the computed distances better reflects the underlying structure of the dataset.