Using an expert deviation carrying the knowledge of climate data in usual clustering algorithms.

作者: Hélène Paugam-Moisy , Didier Bernard , Vincent Pagé , Emmanuel Biabiany

DOI:

关键词: Computer scienceDivergence (statistics)Artificial intelligenceMeasure (mathematics)Homogeneity (statistics)Pattern recognitionHierarchical agglomerative clusteringEuclidean distanceHistogramCluster analysis

摘要: In order to help physicists expand their knowledge of the climate in Lesser Antilles, we aim identify spatio-temporal configurations using clustering analysis on wind speed and cumulative rainfall datasets. But show that L2 norm conventional methods as K-Means (KMS) Hierarchical Agglomerative Clustering (HAC) can induce undesirable effects. So, propose replace Euclidean distance (L2) by a dissimilarity measure named Expert Deviation (ED). Based symmetrized Kullback-Leibler divergence, ED integrates properties observed physical parameters knowledge. This helps comparing histograms four patches, corresponding geographical zones, are influenced atmospheric structures. The combined evaluation internal homogeneity separation clusters obtained was performed. results, which compared silhouette index, five with high indexes. For two available datasets one see that, unlike KMS-L2, KMS-ED discriminates daily situations favorably, giving more meaning discovered algorithm. effect patches is spatial representative elements for KMS-ED. able produce different makes usual structures clearly identifiable. Atmospheric interpret locations impact each cluster specific zone according KMS-L2 does not lead such an interpretability, because represented spatially quite smooth. climatological study illustrates advantage new approach.

参考文章(16)
Alessia Amelio, Clara Pizzuti, A patch-based measure for image dissimilarity Neurocomputing. ,vol. 171, pp. 362- 378 ,(2016) , 10.1016/J.NEUCOM.2015.06.044
Vincent Moron, Isabelle Gouirand, Michael Taylor, Weather types across the Caribbean basin and their relationship with rainfall and sea surface temperature Climate Dynamics. ,vol. 47, pp. 601- 621 ,(2016) , 10.1007/S00382-015-2858-9
Tannecia S. Stephenson, Lucie A. Vincent, Theodore Allen, Cedric J. Van Meerbeeck, Natalie McLean, Thomas C. Peterson, Michael A. Taylor, Arlene P. Aaron‐Morrison, Thomas Auguste, Didier Bernard, Joffrey R. I. Boekhoudt, Rosalind C. Blenman, George C. Braithwaite, Glenroy Brown, Mary Butler, Catherine J. M. Cumberbatch, Sheryl Etienne‐Leblanc, Dale E. Lake, Delver E. Martin, Joan L. McDonald, Maria Ozoria Zaruela, Avalon O. Porter, Mayra Santana Ramirez, Gerard A. Tamar, Bridget A. Roberts, Sukarni Sallons Mitro, Adrian Shaw, Jacqueline M. Spence, Amos Winter, Adrian R. Trotman, Changes in extreme temperature and precipitation in the Caribbean region, 1961–2010 International Journal of Climatology. ,vol. 34, pp. 2957- 2971 ,(2014) , 10.1002/JOC.3889
Peter J. Rousseeuw, Silhouettes: a graphical aid to the interpretation and validation of cluster analysis Journal of Computational and Applied Mathematics. ,vol. 20, pp. 53- 65 ,(1987) , 10.1016/0377-0427(87)90125-7
Xsitaaz T. Chadee, Ricardo M. Clarke, Daily near-surface large-scale atmospheric circulation patterns over the wider Caribbean Climate Dynamics. ,vol. 44, pp. 2927- 2946 ,(2015) , 10.1007/S00382-015-2621-2
Stephen Walker, Paul Damien, Peter Lenk, On priors with a Kullback-Leibler property Journal of the American Statistical Association. ,vol. 99, pp. 404- 408 ,(2004) , 10.1198/016214504000000386
Alison L. Gibbs, Francis Edward Su, On Choosing and Bounding Probability Metrics International Statistical Review. ,vol. 70, pp. 419- 435 ,(2002) , 10.1111/J.1751-5823.2002.TB00178.X
I. Olkin, F. Pukelsheim, The distance between two random vectors with given dispersion matrices Linear Algebra and its Applications. ,vol. 48, pp. 257- 263 ,(1982) , 10.1016/0024-3795(82)90112-4
Maude Manouvrier, Marta Rukoz, Geneviève Jomier, A generalized metric distance between hierarchically partitioned images Proceedings of the 6th international workshop on Multimedia data mining mining integrated media and complex data - MDM '05. pp. 33- 41 ,(2005) , 10.1145/1133890.1133894