作者: Sophia Bano , Tamas Suveges , Jianguo Zhang , Stephen J. Mckenna
DOI: 10.1109/ACCESS.2018.2850284
关键词:
摘要: Continuous detection of social interactions from wearable sensor data streams has a range potential applications in domains, including health and care, security, assistive technology. We contribute an annotated, multimodal set capturing such using video, audio, GPS, inertial sensing. present methods for automatic temporal segmentation focused support vector machines recurrent neural networks with features extracted both audio video streams. The interaction occurs when the co-present individuals, having mutual focus attention, interact by first establishing face-to-face engagement direct conversation. describe evaluation protocol, framewise, extended event-based measures, provide empirical evidence that fusion visual face track scores voice activity provides effective combination. methods, contributed set, protocol together benchmark future research on this problem. is available at https://doi.org/10.15132/10000134 .