作者: Felix Naumann , Ralf Krestel
关键词: Data cleansing 、 Data profiling 、 Metadata 、 World Wide Web 、 Information system 、 Computer science 、 Task (project management) 、 Text mining
摘要: The Hasso Plattner Institute (HPI) is a private computer science institute funded by the eponymous SAP co-founder. It affiliated with University of Potsdam in Germany and dedicated to research teaching, awarding B.Sc., M.Sc., Ph.D. degrees.The Information Systems group was founded 2006, currently has around ten students about 15 masters actively involved our activities. Our initial still ongoing focus been area data cleansing duplicate detection. More recently we have become active text mining extract structured information from text, even more profiling, i.e., task discovering various metadata dependencies instance.