摘要: Recently, a number of TREC tracks have adopted retrieval effectiveness metric called bpref which has been designed for evaluation environments with incomplete relevance data. A graded-relevance version this rpref also proposed. However, we show that the application Q-measure, normalised Discounted Cumulative Gain (nDCG) or Average Precision (AveP)to condensed lists, obtained by ?ltering out all unjudged documents from original ranked is actually better solution to incompleteness problem than bpref. Furthermore, use graded boosts robustness IR and therefore Q-measure nDCG based on lists are best choices. To end, four test collections NTCIR compare ten different metrics in terms system ranking stability pairwise discriminative power.