Google LLC today debuted VideoBERT, an artificial intelligence system that can watch part of a video and extrapolate what will happen in the next few seconds like a human.
Equipping a computer with the ability to understand and draw conclusions from a visual scene requires an incredibly sophisticated algorithm. For Google’s researchers, however, the challenge wasn’t building the algorithm but finding enough data with which to train it. Machine learning models must ingest enormous amounts of information to master even basic concepts and that information typically must be prepared by hand.
That wasn’t feasible for VideoBERT, since teaching the model how to predict future events required more sample videos that what Google’s researchers could’ve assembled by hand. They would have additionally had to write descriptions for each frame of each clip just so the AI could follow what’s happening. So the team came up with an alternative: freely-available instructional videos.
In a video that shows how to cook an omelette or fill a tire, the person demonstrating the task will often explain each step as they perform it, narration that the researchers used as a substitute for the frame-by-frame descriptions they would have had to create for the AI otherwise. The team compiled over a million clips spanning categories such as cooking and gardening. They then fed them to VideoBERT to teach the model how to trace the progress of common activities.
After the training, the model was set loose on a collection of cooking videos it had never seen before. When presented with a video fragment showing a bowl of flour and cocoa powder, VideoBERT astutely predicted that the ingredients will be placed in an oven and become a brownie or cupcake. The researchers also managed to harness the algorithm’s observation skills to extract a recipe from a video in which a chef explained how to cook a steak.
The methods Google developed to train VideoBERT could eventually find use in far more serious applications. Self-driving cars, for instance, might become safer if they gained the ability to accurately predict where nearby vehicles will be a few seconds into the future. Such foresight can also be a big asset for drones and industrial robots that operate in close proximity to human workers.
Since you’re here …
… We’d like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.
If you like the reporting, video interviews and other ad-free content here, please take a moment to check out a sample of the video content supported by our sponsors, tweet your support, and keep coming back to SiliconANGLE.
Join To Our Newsletter
You are welcome