Combining domain-specific heuristics for author name disambiguation

作者: Alan Filipe Santana , Marcos André Gonçalves , Alberto HF Laender , Anderson Ferreira , None

DOI: 10.5555/2740769.2740799

关键词:

摘要: Author name disambiguation has been one of the hardest problems faced by digital libraries since their early days. Historically, supervised solutions have empirically outperformed those based on heuristics, but with burden having to rely manually labelled training sets for learning process. Moreover, most just apply some type generic machine solution and do not exploit specific knowledge about problem. In this paper, we follow a similar reasoning, in opposite direction. Instead extending an existing solution, propose set carefully designed heuristics similarity functions supervision only optimize such parameters each particular dataset. As our experiments show, result is very effective, efficient practical author method that can be used many different scenarios.

参考文章(16)
Andrew McCallum, Pallika Kanani, Chris Pal, Improving author coreference by resource-bounded information gathering from the web international joint conference on artificial intelligence. pp. 429- 434 ,(2007)
Vetle I. Torvik, Neil R. Smalheiser, Author name disambiguation in MEDLINE ACM Transactions on Knowledge Discovery from Data. ,vol. 3, pp. 1- 29 ,(2009) , 10.1145/1552303.1552304
Anderson A. Ferreira, Adriano Veloso, Marcos André Gonçalves, Alberto H.F. Laender, Effective self-training author name disambiguation in scholarly digital libraries acm/ieee joint conference on digital libraries. pp. 39- 48 ,(2010) , 10.1145/1816123.1816130
Hui Han, Wei Xu, Hongyuan Zha, C. Lee Giles, A hierarchical naive Bayes mixture model for name disambiguation in author citations Proceedings of the 2005 ACM symposium on Applied computing - SAC '05. pp. 1065- 1069 ,(2005) , 10.1145/1066677.1066920
Denilson Alves Pereira, Berthier Ribeiro-Neto, Nivio Ziviani, Alberto H.F. Laender, Marcos André Gonçalves, Anderson A. Ferreira, Using web information for author name disambiguation Proceedings of the 2009 joint international conference on Digital libraries - JCDL '09. pp. 49- 58 ,(2009) , 10.1145/1555400.1555409
In-Su Kang, Pyung Kim, Seungwoo Lee, Hanmin Jung, Beom-Jong You, Construction of a large-scale test set for author disambiguation Information Processing and Management. ,vol. 47, pp. 452- 465 ,(2011) , 10.1016/J.IPM.2010.10.001
Alberto H. F. Laender, Ricardo G. Cota, Marcos André Gonçalves, Anderson A. Ferreira, Cristiano Nascimento, An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations Journal of the Association for Information Science and Technology. ,vol. 61, pp. 1853- 1870 ,(2010) , 10.1002/ASI.V61:9
Anderson A. Ferreira, Rodrigo Silva, Marcos André Gonçalves, Adriano Veloso, Alberto H.F. Laender, Active associative sampling for author name disambiguation acm/ieee joint conference on digital libraries. pp. 175- 184 ,(2012) , 10.1145/2232817.2232851
In-Su Kang, Seung-Hoon Na, Seungwoo Lee, Hanmin Jung, Pyung Kim, Won-Kyung Sung, Jong-Hyeok Lee, On co-authorship for author disambiguation Information Processing & Management. ,vol. 45, pp. 84- 97 ,(2009) , 10.1016/J.IPM.2008.06.006
Dongwon Lee, Byung-Won On, Jaewoo Kang, Sanghyun Park, Effective and scalable solutions for mixed and split citation problems in digital libraries information quality in information systems. pp. 69- 76 ,(2005) , 10.1145/1077501.1077514