摘要: XML documents are becoming more and common in various environments. In particular, enterprise-scale document management is commonly centred around XML, desktop applications as well online collections soon to follow. The growing number of increases the importance appropriate indexing methods search tools keeping information accessible. Therefore, we focus on content that stored format develop such methods. Because used for different kinds ranging all way from records data fields narrative full-texts, Information Retrieval facing a new challenge identifying which subject queries should be indexed full-text search. response this challenge, analyse relation character tags order separate data. As result, able both reduce size index by 5-6% improve retrieval precision select fragments indexed. Besides being challenging, comes with many unexplored opportunities not paid much attention literature. For example, authors often tag they want emphasise using typeface stands out. tagged constitutes phrases descriptive useful They simple detect documents, but also possible confuse other inline-level text. Nonetheless, results seem when detected given additional weight index. Similar improvements reported related associated including titles, captions, references.