作者: Manolis Stamatogiannakis , Elias Athanasopoulos , Herbert Bos , Paul Groth
DOI: 10.1145/3062176
关键词:
摘要: Information produced by Internet applications is inherently a result of processes that are executed locally. Think web server makes use CGI script, or content management system where post was first edited using word processor. Given the impact these to published online, consumer information may want understand what those impacts were. For example, understanding from text copied and pasted make post, if script updated with latest security patches, all influence confidence on content. Capturing exposing this provenance thus important ascertaining trust online Furthermore, providers internet wish have access same for debugging audit purposes. following rigid structure (such as databases workflows), disclosed systems been developed efficiently accurately capture data. However, capturing unstructured processes, user-interactive computing used produce content, remains problem be tackled.In article, we address processes. Our approach, called PROV2R (PROVenance Record Replay) composed two parts: (a) decoupling analysis its capture; (b) high-fidelity unmodified programs. We techniques originating in reverse engineering communities, namely, record replay taint tracking. Taint tracking fundamentally addresses data but impractical apply at runtime due extremely high overhead. With number case studies, demonstrate enables capture, while keeping overhead manageable levels. In addition, show how captured can represented W3C PROV model exposure Web.