Corpus management by automatic categorization into functional domains to support faceted querying

作者: Charles E Beller , William G Dubyak , Palani Sakthi , Kristen M Summers

DOI:

关键词:

摘要: Embodiments can provide a computer implemented method, in a data processing system comprising a processor and a memory comprising instructions which are executed by the processor to cause the processor to implement an enhanced corpus management system, the method comprising: identifying one or more functional domain categories; ingesting one or more incoming documents to form an open-domain corpus; for each functional domain category, identifying one or more representative documents to establish a seed sub-corpus; calculating a degree of fit score between each of the one or more incoming documents and the one or more established functional domain category seed sub-corpora; and assigning one or more of the incoming documents to one or more of the functional domain categories based upon the degree of fit score to create an enhanced corpus.

参考文章(0)