Enhancing the Human-IoT Interaction Experience Through Deep Multimodal Sensor Fusion

Author: ORCID icon orcid.org/0000-0002-2585-8383
Billah, Md Fazlay Rabbi Masum, Computer Science - School of Engineering and Applied Science, University of Virginia
Campbell, Brad, EN-Comp Science Dept, University of Virginia

The widespread adoption of Internet of Things (IoT) devices and sensors in places frequented by people, particularly in smart homes, office buildings, and industries is increasing the need for a more intuitive and natural interaction between humans and IoT devices. This can be achieved through context awareness, where IoT devices can understand and adapt to the user's current context and needs, allowing for seamless control without the need for explicit commands or navigating through dedicated applications, resulting in a more personalized experience.

To enable automatic user context understanding, deep multimodal sensor fusion has emerged as a promising solution. This approach involves combining data from multiple sensors and devices with radio frequency (RF) signal attributes to create a more complete understanding of the environment. This includes aspects such as understanding room occupancy, the location of devices relative to the user, user identity, and gestures. However, there remains a gap in accurately extracting and integrating RF signal attributes with sensor data for use in deep learning models that can effectively interpret the environment.

In this dissertation, we propose several novel solutions to bridge this gap. Our strategy for enhancing the human-IoT interaction experience encompasses two perspectives: device-dependent and device-independent. In the device-dependent approach, we assume that the user is carrying a smart device when attempting to control IoT devices. To this end, we develop two systems. The first system locates IoT devices in a three-dimensional environment on the screen of a smartphone, enabling interactions by touching devices on the screen. The second system enables users wearing smart rings or smartwatches to point at and control a device using gestures or voice commands. In contrast, the device-independent approach assumes that the user does not have any smart devices with them. To accommodate this, we develop three innovative systems that leverage distributed smart devices within the environment to work together in detecting human presence, identity, location, and orientation. This allows for personalized control and interaction with IoT devices without the need for any device-dependent interaction mechanism.

Our system undergoes evaluation in a variety of settings with diverse levels of clutter, location, and device placement. By fusing RF signal attributes with raw sensor data in an effective manner, we are able to achieve enhancements in performance when compared to existing state-of-the-art solutions. Our multimodal fusion strategy, comprising of a finely-tuned feature extraction method, an intermediate fusion of features, and an integration of application-specific deep learning models, enables us to preserve individual feature distributions and correlations, ensure scalability, adapt to dynamic surroundings, and ultimately elevate the overall human-IoT interaction experience.

PHD (Doctor of Philosophy)
Human-IoT Interaction, Wireless Sensing, Multimodal Sensor Fusion, Machine Learning, Smart Home
Sponsoring Agency:
National Science Foundation
All rights reserved (no additional license for public reuse)
Issued Date: