SAVANT: Shared Vision-Action Subgoal Imagination for Long-Horizon Planning; Understanding Design Choices Behind Assistive Robots: A Case Study on ROBEAR

Author:
Vitchutripop, Teeratham, School of Engineering and Applied Science, University of Virginia
Advisors:
Iqbal, Tariq, EN-SIE, University of Virginia
Forelle, MC, Engineering and Society PV-Institutional Research and Analytics EN-Engineering and Society PV-Summer & Spec Acad Progs, University of Virginia
Abstract:

Significant advancements in artificial intelligence have provided great technological breakthroughs in the recent decade which, if used correctly, can aid people in tackling diverse problem spaces. While most currently deployed AI systems are grounded in cyberspace, bridging the cyber-physical gap will require intelligently leveraging AI techniques in cyber-physical systems, like robots, that can sense and interact with the physical world. Robotic systems are poised to play a significant role in transforming a variety of different industries, such as healthcare, manufacturing, agriculture, and transportation. Despite this, much work remains to be done in developing robots that are both intelligent and aligned with human values. My technical project focuses on advancing the capabilities of robots through a novel machine learning architecture for training robots to perform long-horizon manipulation tasks. On the other hand, my STS project focuses on human values in robotic design, examining the explicit and latent motivations behind the design of ROBEAR, an assistive robot developed by researchers in Japan to help lift patients with limited mobility.
My technical project proposes a novel architecture for accomplishing long-horizon robot manipulation tasks. Many existing learning-based manipulation policies focus on accomplishing short-horizon manipulation tasks that require interacting with only one object or a single part of it. However, tasks that require manipulating multiple objects in complex sequences demand stronger planning capabilities. Recent advancements in foundation models trained on internet-scale data have shown incredible performance in text and image generation. These models capture aspects of the common-sense reasoning and imaginative faculties found in humans, both of which are crucial to our capacity to plan and perform sensorimotor tasks. My proposed architecture utilizes the power of image generation models to imagine what the state would look like once the subgoal has been accomplished. A pre-trained diffusion model is modified to use the visual observation and subgoal text description as conditioning for generating an image of what completing the subgoal would look likea the pick and place actions for the robot to perform. Our model’s performance was evaluated on the VIMABench simulated benchmarking environment and ablation studies were conducted to evaluate how changes to the number of inference steps affected the model’s performance.
My STS project focuses on analyzing the motivations behind the design of an assistive robot developed in Japan called ROBEAR, which was created to lift patients with limited mobility in place of care workers. Due to Japan’s aging population, robots have been propped up by the Japanese government as being the solution to both its declining workforce and eldercare woes. The ROBEAR was a poster child of this roboticization movement, gaining significant media coverage in 2015 when it was released. Despite this, the project failed to be deployed in real-world settings, remaining primarily as a research tool. This project seeks to gain insight into the design choices behind ROBEAR, specifically through the lens of the Social Construction of Technology (SCOT) framework. Using SCOT, I identify and analyze the relevant social groups surrounding the ROBEAR, such as the end-users (e.g., caregivers and patients), researchers, and the Japanese public. The analysis particularly focuses on how these social groups uniquely view or define the ROBEAR and how they’ve influenced the design of the robot. The paper explores both the explicit and implicit features of the ROBEAR, from its striking cute bear-like design to its lack of emphasis on computation. It also illuminates the explicit and implicit motivations behind the robot’s design, such as end-user constraints and Japanese popular culture. While the primary objective of the project is to shed light on this particular case study, the ultimate hope is that learning about the motivations behind these design choices can help inform roboticists, scientists, engineers, and designers developing future assistive robots about the diverse social factors that contribute to determining what a robot should be like
Like any field, it is common to see robotics researchers siloed into their specific subdomains and rarely stepping back to admire the broader picture. As a researcher within the robot learning and manipulation community, my work is primarily focused on developing machine learning algorithms to enable robots to interact adeptly with objects in their environment. It is uncommon for me to zoom out so far and consider the motivations behind why a robot is designed a certain way or what social factors may have contributed to a robot’s morphology. Conducting both my technical and STS projects simultaneously has provided me with a deeper understanding of what it will take for robotic systems to become mainstream in human-centric environments. Beyond pure competency in accomplishing a task, robots will need to become aligned in all aspects of their design with the competing or shared values, pressures, and biases of various social groups, and it is only through engaging and considering these non-technical factors during development that we can engineer a solution that is both meaningful and responsive to the needs of humanity.

Degree:
BS (Bachelor of Science)
Keywords:
robotics, machine learning, artificial intelligence, human-robot interaction
Notes:

School of Engineering and Applied Science

Bachelor of Science in Computer Science

Technical Advisor: Tariq Iqbal

STS Advisor: MC Forelle

Language:
English
Rights:
All rights reserved (no additional license for public reuse)
Issued Date:
2024/05/10