October 11, 2021
The Toyota Research Institute (TRI) has announced the acceptance of six research papers in the field of machine learning at the International Conference on Computer Vision (ICCV). The research advances understanding across various tasks crucial for robotic perception, including semantic segmentation, 3D object detection, and multi-object tracking.
Over the last six years, TRI said its researchers have made significant strides in robotics, automated driving and materials science, largely due to machine learning – the application of computer algorithms that constantly improve with experience and data.
“Machine learning is the foundation of our research,” said Gill Pratt, CEO of TRI. “We are working to create scientific breakthroughs in the discipline of machine learning itself and then apply those breakthroughs to accelerate discoveries in robotics, automated driving, and battery testing and development.”
As the ICCV started, TRI shared six papers demonstrating its research in machine learning, including geometric deep learning for 3D vision, self-supervised learning and simulation to real or “sim-to-real” transfer.
“Within the field of machine learning, scalable supervision is our focus,” said Adrien Gaidon, head of TRI’s machine learning team. “It is impossible to manually label everything you need at Toyota’s scale, yet this is the state-of-the-art approach, especially for deep learning and computer vision. Thankfully, we can leverage Toyota’s domain expertise in vehicles, robots or batteries to invent alternative forms of scalable supervision, whether via simulation or self-supervised learning from raw data. This approach can boost performance in a wide array of tasks important for automated cars to be safer everywhere anytime, robots to learn faster and battery development to speed up lengthy testing cycles.”
Key findings within the six papers accepted at ICCV include:
- Geometric self-supervised learning significantly improves sim-to-real transfer for scene understanding. The resulting unsupervised domain adaptation algorithm enables recognizing real-world categories without requiring any expensive manual real-world labels.
- TRI’s research on multi-object tracking reveals that synthetic data could endow machines with fundamental human cognitive abilities, like object permanence, that are historically hard for machine learning models but second nature for humans. This development increases the robustness of computer vision algorithms, making them more aligned with people’s visual common sense.
- TRI’s research on pseudo-lidar shows that large-scale self-supervised pre-training considerably boosts performance of image-based 3D object detectors. The proposed geometric pre-training enables training powerful 3D deep learning models from limited 3D labels, which are expensive or sometimes impossible to get from images alone.