作者: Guymon R. Hall , Kazem Taghva
关键词: Formal concept analysis 、 Feature selection 、 Computer science 、 Closure (mathematics) 、 Context (language use) 、 Object (computer science) 、 Database 、 Binary relation 、 Word (computer architecture) 、 Cluster analysis 、 Information retrieval
摘要: As part of information retrieval processes, words are often stemmed to a common root. The Porter Stemming Algorithm operates as rule-based suffix-removal process. can be viewed way cluster related together according one stem. Sometimes includes in that un-related. This experiment attempts correct this using Formal Concept Analysis (FCA). FCA is the process formulating formal concepts from given context. A context consists objects and attributes, binary relation indicates attributes possessed by each object. concept formed computing closure subsets attributes. Using Cranfield document collection, crafted comparison measure between word Google Web 1T 5-gram data set. clusters, results showed varying level success dependent upon error threshold allowed.