Google’s DeepMind Training Robotics With Video and Lang Models

February 8, 2024

564

In 2024, Google’s DeepMind Robotics researchers are among many teams who are exploring the potential of generative AI/large foundational models and robotics for various applications, such as learning and product design. There is a great deal of anticipation surrounding the possibilities of training robotics with DeepMind.

Today, the team is emphasizing their research on giving training robotics a better understanding of what humans expect from them. Instead of just repeating the same task over and over, robots need to be able to recognize and react to changes in their environment or mission parameters. This kind of adaptability would allow robots to be used in more dynamic situations, such as those encountered in a factory, a hospital, or even a home.

The DeepMind team designed AutoRT to harness large foundational models for several different ends. As an example, the system uses a Visual Language Model (VLM) for improved situational awareness. AutoRT also enables a fleet of robots to work in tandem and use cameras to map out their environment and identify objects.

The hardware can accomplish tasks suggested by a large language model (LLM), which is widely believed to be the key to enabling robots to understand more natural language commands, eliminating the need for hard coding skills. AutoRT has been extensively tested over the past seven months and can manage up to 20 robots and 52 devices simultaneously.

DeepMind has conducted 77,000 trials and completed over 6,000 tasks. Additionally, they have developed RT-Trajectory which uses video input to teach robots. Many teams are using YouTube videos to train robots on a large scale, but RT-Trajectory adds a two-dimensional sketch of the arm in action over the video.

We note that the trajectories, represented as RGB images, provide practical visual cues to the model as it learns the robot-control policies. DeepMind reports that their RT-Trajectory training had double the success rate of RT-2 training, achieving 63% success on 41 tasks compared to 29%. They emphasize that RT-Trajectory takes advantage of the abundant data from robotic motion that is currently not being utilized.

RT-Trajectory takes another step in the journey to construct robots that can move with efficient accuracy in new scenarios, while also unlocking knowledge from existing datasets.

Google’s DeepMind Training Robotics With Video and Lang Models

DJI Osmo Nano vs Insta360 Go Ultra: The Ultimate Guide to Choosing the Best Compact Action Camera in 2025

Verizon’s New Lite Home Internet Plan Sparks Mixed Reactions

ChatGPT Outage Hits UK Users as OpenAI Scrambles to Restore AI Service

Most Popular

DJI Osmo Nano vs Insta360 Go Ultra: The Ultimate Guide to Choosing the Best Compact Action Camera in 2025

Instagram Restyle: Meta AI’s New Tool That Transforms Story Editing with Text Prompts

Verizon’s New Lite Home Internet Plan Sparks Mixed Reactions

Microsoft Edge’s New Copilot Mode Revolutionizes Browsing with AI

EDITOR PICKS

Samsung Galaxy Tab S11 boosts productivity with a new chipset, DeX upgrades, and a redesigned S Pen that rivals the iPad Pro experience.

HP 15.6″ Laptop with Windows 11 Pro and MS Office Now 73% Off on Amazon

Apple Foldable iPhone Leak Reveals 7.8-Inch Display, A20 Pro Chip, and $2,000 Price Tag

POPULAR POSTS

DJI Osmo Nano vs Insta360 Go Ultra: The Ultimate Guide to Choosing the Best Compact Action Camera in 2025

Instagram Restyle: Meta AI’s New Tool That Transforms Story Editing with Text Prompts

Verizon’s New Lite Home Internet Plan Sparks Mixed Reactions

POPULAR CATEGORY

ABOUT US

FOLLOW US