作者: Michael Stockerl , Christoph Ringlstetter , Matthias Schubert , Eirini Ntoutsi , Hans-Peter Kriegel
关键词:
摘要: Although living in the information age for decades, paperwork is still a tedious part of everybody's life. Assistance systems that implement techniques digitization and document understanding may offer considerable reductions time effort users. A large portion paper documents like invoices, delivery receipts or admonitions are based on fixed company specific template therefore exhibit high degree similarity. In this work, we propose extraction method over stream incoming allocation assigning new instances from to most suitable templates. Our employs text augmented by layout represent digital image document. Document similarity assessed with respect both textual parts document; matching terms contribute accordingly their distance query terms. To be more robust against distortions due process, templates not static, rather they maintained an online fashion assigned documents. Real data experiments show combination continuous adaptation through update, improves identification quality earlier proposed methods.