Clowns, Crowds, and Clouds: A Cross-Enterprise Approach to Detecting Information Leakage Without Leaking Information

作者: Neil Cooke , Lee Gillam

DOI: 10.1007/978-1-4471-2236-4_16

关键词:

摘要: In this paper we elaborate a near-duplicate and plagiarism detection ­service that combines both Crowd Cloud computing in searching for evaluating matching documents. We believe our approach could be used across collaborating or competing Enterprises, against the web, without any Enterprise needing to reveal contents of its corporate (confidential) The service involves novel document fingerprinting which derives grammatical patterns but does not require knowledge rely on hash-based approaches. Our generates lossy highly compressed signature from it is possible generate fixed-length as fingerprints shingles. Fingerprint sizes are established by estimating likely random hit rates resulting size pattern target search. geared towards enabling Clowns, those who may attempt to, have, leaked confidential sensitive information, have otherwise plagiarized, provide copy original information. Crowds validate results emerging systematic evaluation service, ensuring modifications continue act effectively continuous scaling-up. discuss formulation assess efficacy reference an international benchmarking competition where system achieves top 5 performance (Precision=0.96 Recall=0.39).

参考文章(27)
Jan Kasprzak, Michal Brandejs, Improving the Reliability of the Plagiarism Detection System Lab Report for PAN at CLEF 2010 CLEF (Notebook Papers/LABs/Workshops). ,(2010)
Terry Harmer, Ron Perrott, Rhys Lewis, The PRISM On-demand Digital Media Cloud ieee international conference on cloud computing technology and science. pp. 327- 341 ,(2010) , 10.1007/978-1-84996-241-4_19
Dan Boneh, Craig Gentry, A fully homomorphic encryption scheme Stanford University. ,(2009)
Stefano Paraboschi, Sabrina De Capitani di Vimercati, Pierangela Samarati, Ernesto Damiani, An Open Digest-based Technique for Spam Detection. iasted international conference on parallel and distributed computing and systems. pp. 559- 564 ,(2004)
Mark Stevenson, Miles Whitehead, Tony Rose, The reuters corpus volume 1 - From yesterday's news to tomorrow's language resources language resources and evaluation. ,(2002)
Lee Gillam, Nikos Antonopoulos, Cloud Computing: Principles, Systems and Applications Published in <b>2017</b>. ,(2010)
Kirsten Loutzenhiser, Antonio Pita, Jillian Mitchell Reed, Revisiting Plagiarism In An Internet Era: How Modern Technology Contributes To The Problem And Solutions Journal of College Teaching & Learning. ,vol. 3, ,(2011) , 10.19030/TLC.V3I8.1693
Hui Yang, Jamie Callan, Near-duplicate detection by instance-level constrained clustering Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '06. pp. 421- 428 ,(2006) , 10.1145/1148170.1148243
Craig Gentry, Fully homomorphic encryption using ideal lattices Proceedings of the 41st annual ACM symposium on Symposium on theory of computing - STOC '09. pp. 169- 178 ,(2009) , 10.1145/1536414.1536440