The Information Systems Group at HPI

作者: Felix Naumann , Ralf Krestel

DOI: 10.1145/3003665.3003678

关键词: Data cleansingData profilingMetadataWorld Wide WebInformation systemComputer scienceTask (project management)Text mining

摘要: The Hasso Plattner Institute (HPI) is a private computer science institute funded by the eponymous SAP co-founder. It affiliated with University of Potsdam in Germany and dedicated to research teaching, awarding B.Sc., M.Sc., Ph.D. degrees.The Information Systems group was founded 2006, currently has around ten students about 15 masters actively involved our activities. Our initial still ongoing focus been area data cleansing duplicate detection. More recently we have become active text mining extract structured information from text, even more profiling, i.e., task discovering various metadata dependencies instance.

参考文章(22)
Ziawasch Abedjan, Lukasz Golab, Felix Naumann, Profiling relational data: a survey very large data bases. ,vol. 24, pp. 557- 581 ,(2015) , 10.1007/S00778-015-0389-Y
Toni Gruetze, Gary Yao, Ralf Krestel, Learning Temporal Tagging Behaviour the web conference. pp. 1333- 1338 ,(2015) , 10.1145/2740908.2741701
Mauricio A. Hernández, Salvatore J. Stolfo, Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem Data Mining and Knowledge Discovery. ,vol. 2, pp. 9- 37 ,(1998) , 10.1023/A:1009761603038
Melanie Herschel, Felix Naumann, Sascha Szott, Maik Taubert, Scalable Iterative Graph Duplicate Detection IEEE Transactions on Knowledge and Data Engineering. ,vol. 24, pp. 2094- 2108 ,(2012) , 10.1109/TKDE.2011.99
Ziawasch Abedjan, Toni Gruetze, Anja Jentzsch, Felix Naumann, Profiling and mining RDF data with ProLOD international conference on data engineering. pp. 1198- 1201 ,(2014) , 10.1109/ICDE.2014.6816740
Ralf Krestel, Alex Wall, Wolfgang Nejdl, Treehugger or petrolhead?: identifying bias by comparing online news articles with political speeches the web conference. pp. 547- 548 ,(2012) , 10.1145/2187980.2188120
Uwe Draisbach, Felix Naumann, Sascha Szott, Oliver Wonneberg, Adaptive Windows for Duplicate Detection 2012 IEEE 28th International Conference on Data Engineering. pp. 1073- 1083 ,(2012) , 10.1109/ICDE.2012.20
Dustin Lange, Felix Naumann, Efficient similarity search Proceedings of the 20th ACM international conference on Information and knowledge management - CIKM '11. pp. 1679- 1688 ,(2011) , 10.1145/2063576.2063819
Eugene Agichtein, Luis Gravano, Snowball: extracting relations from large plain-text collections acm international conference on digital libraries. pp. 85- 94 ,(2000) , 10.1145/336597.336644
Melanie Weis, Felix Naumann, Ulrich Jehle, Jens Lufter, Holger Schuster, Industry-scale duplicate detection very large data bases. ,vol. 1, pp. 1253- 1264 ,(2008) , 10.14778/1454159.1454165