| |

Research: Robots Learn Household Tasks by Watching Humans

CMU WHIRL 400x275

July 21, 2022

By Aaron Aupperlee, Carnegie Mellon University

The robot watched as Shikhar Bahl opened the refrigerator door. It recorded his movements, the swing of the door, the location of the fridge and more, analyzing this data and readying itself to mimic what Bahl had done.

It failed at first, missing the handle completely at times, grabbing it in the wrong spot or pulling it incorrectly. But after a few hours of practice, the robot succeeded and opened the door.

“Imitation is a great way to learn,” said Bahl, a Ph.D. student at the Robotics Institute (RI) in Carnegie Mellon University’s School of Computer Science. “Having robots actually learn from directly watching humans remains an unsolved problem in the field, but this work takes a significant step in enabling that ability.”

Bahl worked with Deepak Pathak and Abhinav Gupta, both faculty members in the RI, to develop a new learning method for robots called WHIRL, short for In-the-Wild Human Imitating Robot Learning. WHIRL is an efficient algorithm for one-shot visual imitation. It can learn directly from human-interaction videos and generalize that information to new tasks, making robots well-suited to learning household chores. People constantly perform various tasks in their homes. With WHIRL, a robot can observe those tasks and gather the video data it needs to eventually determine how to complete the job itself.

The team added a camera and their software to an off-the-shelf robot, and it learned how to do more than 20 tasks — from opening and closing appliances, cabinet doors and drawers to putting a lid on a pot, pushing in a chair and even taking a garbage bag out of the bin. Each time, the robot watched a human complete the task once and then went about practicing and learning to accomplish the task on its own. The team presented their research this month at the Robotics: Science and Systems conference in New York.

“This work presents a way to bring robots into the home,” said Pathak, an assistant professor in the RI and a member of the team. “Instead of waiting for robots to be programmed or trained to successfully complete different tasks before deploying them into people’s homes, this technology allows us to deploy the robots and have them learn how to complete tasks, all the while adapting to their environments and improving solely by watching.”




Current methods for teaching a robot a task typically rely on imitation or reinforcement learning. In imitation learning, humans manually operate a robot to teach it how to complete a task. This process must be done several times for a single task before the robot learns. In reinforcement learning, the robot is typically trained on millions of examples in simulation and then asked to adapt that training to the real world.

Both learning models work well when teaching a robot a single task in a structured environment, but they are difficult to scale and deploy. WHIRL can learn from any video of a human doing a task. It is easily scalable, not confined to one specific task and can operate in realistic home environments. The team is even working on a version of WHIRL trained by watching videos of human interaction from YouTube and Flickr.

Progress in computer vision made the work possible. Using models trained on internet data, computers can now understand and model movement in 3D. The team used these models to understand human movement, facilitating training WHIRL. 

With WHIRL, a robot can accomplish tasks in their natural environments. The appliances, doors, drawers, lids, chairs and garbage bag were not modified or manipulated to suit the robot. The robot’s first several attempts at a task ended in failure, but once it had a few successes, it quickly latched on to how to accomplish it and mastered it. While the robot may not accomplish the task with the same movements as a human, that’s not the goal. Humans and robots have different parts, and they move differently. What matters is that the end result is the same. The door is opened. The switch is turned off. The faucet is turned on.

“To scale robotics in the wild, the data must be reliable and stable, and the robots should become better in their environment by practicing on their own,” Pathak said.

Editor’s Note: This article was first published at the CMU News site.

Robotics World News

  • Seed Group and Ryberg Partner to Bring AI-Assisted Disinfection Technology to the Middle East

    Seed Group and Ryberg Partner to Bring AI-Assisted Disinfection Technology to the Middle East

    Netherlands-based Ryberg combines the latest in computer vision, robotics, and infection prevention-technologies to deliver intelligence and efficiency in disinfection. Ryberg has developed Disinfection Robots, which are self-driving disinfection machines. Ryberg’s patented Disinfection Engine makes intelligent disinfection decisions, allowing the robots to disinfect spaces for up to 6 hours—to bring a layer of defense in the… Read More…

  • Wind River and Airbus Collaborate on Certification of Automatic Air-to-Air Refueling (A3R)

    Wind River and Airbus Collaborate on Certification of Automatic Air-to-Air Refueling (A3R)

    Wind River, a global leader in delivering software for mission-critical intelligent systems, announced it has worked with Airbus to support the A330 Multi-Role Tanker Transport (MRTT) aircraft for automatic air-to-air refueling (A3R). The MRTT aircraft is the world’s first tanker to be certified for automatic air-to-air refueling boom operations in daylight. Airbus uses VxWorks 653 for the A330 MRTT… Read More…

Products for Robots & Cobots

  • Jabil, OSRAM, Artilux Develop Next-Generation 3D Camera Prototype

    Jabil, OSRAM, Artilux Develop Next-Generation 3D Camera Prototype

    Manufacturing solutions provider Jabil has announced that its optical design center in Jena, Germany, is currently demonstrating a prototype of a next-generation 3D camera with the ability to operate in both indoor and outdoor environments up to a range of 20 meters. The prototype was the result of a combination of proprietary technologies from Jabil,… Read More…

  • Product Profile: IDS CMOS Sensor IMX273

    Product Profile: IDS CMOS Sensor IMX273

    What is it? IDS cameras can be found in automotive, packaging, printing, and robotics industries, as well as in medical technology, traffic monitoring, security, kiosk systems, and logistics contexts.  The global shutter CMOS sensor IMX273 in the Sony Pregius series offers high image quality, high sensitivity and wide dynamic range. With a resolution of 1.58… Read More…