Skip to main content

Verified by Psychology Today

Artificial Intelligence

An AI Robot Learns Recipes by Watching Human Chefs

A robot chef used AI machine learning and cooking videos to learn recipes.


Your salad may be prepared by a robot chef one day. A new University of Cambridge study published in IEEE Access shows how artificial intelligence (AI) computer vision empowers a robotic chef to identify and learn new recipes by watching videos of human chefs.

“Robotic chefs are a promising technology that can bring sizeable health and economic benefits when deployed ubiquitously,” wrote first author Grzegorz Sochacki, along with University of Cambridge research colleagues Arsen Abdulali, Narges Khadem Hosseini, and Fumiya Iida, a professor of robotics at the Department of Engineering, and head of the Bio-Inspired Laboratory.

The researchers sought to discover if a robot chef could learn recipes like humans by observation. “Implementing a robotic chef is a complicated task, that requires the robot to be competent in many fields of robotics like manipulation, sensing, feedback, decision-making and perception,” they wrote.

The key to enabling the robot chef to learn like humans was to empower the AI algorithm to identify the ingredients and actions performed by the human chef. The University of Cambridge team used Openpose, a neural network for real-time multi-person human pose detection and an object detection AI model called YOLO (You Only Look Once), an artificial neural network that processes images in real-time in a single evaluation. YOLO is an open-source software that was introduced eight years ago. The base model of YOLO is capable of performing object detection as rapidly as 45 frames per second.

For this proof-of-concept robot lab experiment, the team decided to focus on salads since many of the ingredients are identifiable by YOLO algorithms and these types of dishes are relatively straightforward to automate.

The University of Cambridge team created videos of humans preparing eight salad recipes consisting of five ingredients: orange, banana, broccoli, carrot, and apple. The robot chef watches these videos of humans demonstrating recipes with its camera. The robot’s AI computer vision software analyzes frames of the video demonstration in order to detect objects such as utensils and ingredients, as well as poses of human chefs. By analyzing the correlations between the right hand and objects, the AI predicts which objects are being used and what actions are being performed.

“A high correlation is an indication of lengthy handling of an item and therefore is an indication of a certain action,” the researchers wrote.

The robot observation of the human chef demonstration is converted into binary states which are then filtered using a hidden Markov Model to remove noise and false positive and false negative detections. In statistics, a Hidden Markov model (HMM) is a type of graphical model that is commonly used to represent probability distributions over sequences of observations. Named after Russian mathematician Andrey Andreyevich Markov (1856-1922), the Markov model is a stochastic method used to model randomly changing systems that have the Markov property in which future states depend only on the present state and not upon the past. In Hidden Markov Models, the relationship between the underlying variables that generate the observed data are called “hidden states,” and observations are modeled using a probability distribution using transition probabilities (the probability of transitioning from one hidden state to another) and emission probabilities (the probability of observing an output given a hidden state).

The robot chef observed 16 video demonstrations of human chefs and, the researchers reported, “The algorithm correctly recognizes known recipes in 93% of the demonstrations and successfully learned new recipes when shown, using off-the-shelf neural networks for computer vision.

“We show that videos and demonstrations are viable sources of data for robotic chef programming when extended to massive publicly available data sources like YouTube.”

Copyright © 2023 Cami Rosso All rights reserved.

More from Cami Rosso
More from Psychology Today
More from Cami Rosso
More from Psychology Today