作者: Slava M. Katz
DOI:
关键词:
摘要: Apparatus and method for evaluating the likelihood of an event (such as a word) following string known events, based on sequence counts derived from sparse sample data. Event sequences--or m-grams--include key subsequent event. For each m-gram is stored discounted probability generated by applying modified Turing's estimate, example, to count-based probability. occurring in data there normalization constant which preferably (a) adjusts probabilities multiple counting, if any, (b) includes freed mass allocated m-grams do not occur To determine selected "backing off" scheme employed successively shorter keys (of events) followed (representing m-grams) are searched until found having therefor. The constants longer keys--for corresponding have no probability--are combined together with produce being next.