Vision-based Environmental Understanding for Motion Coordination

MSc assignment

This thesis investigates a vision-driven framework for advancing human motion prediction and enabling seamless coordination between humans and robotic systems. As intelligent robots increasingly enter domains such as assistive healthcare, neuroprosthetics, rehabilitation, industrial collaboration, and human–computer interaction, understanding why a user intends to move—not just how—has become a critical challenge. Traditional intention-decoding pipelines rely mostly on physiological signals such as surface electromyography (sEMG), which provide rich information about muscle activation but lack awareness of the external environment where actions take place. This often limits their predictive power in real-world, dynamic scenarios where human decisions are strongly shaped by visual context.

To address this gap, the proposed project integrates a first-person perspective camera to capture comprehensive environmental and object-level information directly from the user’s viewpoint. By using the state-of-the-art pre-trained computer vision models, the system will interpret objects, hand states, affordances, spatial relationships, and contextual cues to infer the user’s motion intentions directly from visual perception. These high-level visual representations will then be fused with sEMG signals to produce more accurate, context-aware predictions of arm and hand trajectories.

By embedding environmental understanding into the intention-decoding pipeline, the robotic system gains the ability to anticipate user actions, adapt to dynamic surroundings, and generate smoother, safer, and more human-like coordinated movements. Such capabilities are particularly impactful for:

Assistive and rehabilitation robotics, enabling prostheses, exoskeletons, and therapy devices to interpret user intention more intuitively.
Human–robot collaborative workspaces, where robots must react appropriately to fast-changing human actions and object interactions.
Augmented and mixed reality interfaces, where gesture prediction is enhanced by environmental context.
Everyday assistive technologies, supporting users with motor impairments in performing complex tasks in natural settings.

Through this project, students will gain hands-on experience with multimodal sensor fusion, computer-vision-based environmental perception, and human–robot interaction. The resulting framework contributes toward next-generation intelligent robots that are perceptive, adaptive, and capable of acting as truly collaborative partners.