摘要: State-of-the-art patch-based image representations involve a pooling operation that aggregates statistics computed from local descriptors. Standard operations include sum- and max-pooling. Sum-pooling lacks discriminability because the resulting representation is strongly influenced by frequent yet often uninformative descriptors, but only weakly rare potentially highly-informative ones. Max-pooling equalizes influence of descriptorsbut applicable to rely on count statistics, such as bag-of-visual-words (BOV)and its soft- sparse-coding extensions. We propose novel mechanism achieves same effect max-pooling beyond BOV especially state-of-the-art Fisher Vector --hence name Generalized Max Pooling (GMP). It involves equalizing similarity between each patch pooled representation, which shown be equivalent re-weighting per-patch statistics. show five public classification benchmarks proposedGMP can lead significant performance gains with respect toheuristic alternatives.