Online Process Phase Detection Using Multimodal Deep Learning.
Researchers
Journal
Modalities
Models
Abstract
We present a multimodal deep-learning structure that automatically predicts phases of the trauma resuscitation process in real-time. The system first pre-processes the audio and video streams captured by a Kinect’s built-in microphone array and depth sensor. A multimodal deep learning structure then extracts video and audio features, which are later combined through a “slow fusion” model. The final decision is then made from the combined features through a modified softmax classification layer. The model was trained on 20 trauma resuscitation cases (>13 hours), and was tested on 5 other cases. Our results showed over 80% online detection accuracy with 0.7 F-Score, outperforming previous systems.