Learning on real robots from experience and simple user feedback

Pablo Quintía Vidal, Roberto Iglesias Rodríguez, Miguel Ángel Rodríguez González, Carlos Vázquez Regueiro


In this article we describe a novel algorithm that allows fast and continuous learning on a physical robot working in a real environment. The learning process is never stopped and new knowledge gained from robot-environment interactions can be incorporated into the controller at any time. Our algorithm lets a human observer control the reward given to the robot, hence avoiding the burden of defining a reward function. Despite the highly-non-deterministic reinforcement, through the experimental results described in this paper, we will see how the learning processes are never stopped and are able to achieve fast robot adaptation to the diversity of different situations the robot encounters while it is moving in several environments.


Autonomous robots; Reinforcement learning


R. S. Sutton, Reinforcement learning: An introduction, MIT Press, 1998.

M.A. Bozarth, Pleasure: The politics and the reality, Springer Netherlands, pp. 5–14,1994.

E.L. Thorndike, Animal Intelligence,Hafner, Darien, CT, 1911.

A. L. Thomaz and C. Breazeal, Teachable robots: Understanding human teaching behaviour to build more effective robot learners, Artificial Intelligence, volume 172, pages: 716–737, 2008.

A. L. Thomaz, G. Hoffman and C. Breazeal, Reinforcement learning with human teachers: Understanding how people want to teach robots, in Proceedings of the 15th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), 2006.

L.P. Kaelbling, M.L. Littman and A.W. Moore, Reinforcement learning: A survey, Journal of Artificial Intelligence Research, 4:237–285, 1996.

G. A. Carpenter, S. Grossberg and D. B. Rosen, Fuzzy ART: Fast stable learning and categorization of analog patterns by an adaptive resonance system, Neural Networks, volume 4, pages: 759–771,1991.

Pablo Quintia, Roberto Iglesias, Carlos V. Regueiro and Miguel A. Rodriguez, Simultaneous learning of perception and action in mobile robots, Robotics and Autonomous Systems, vol 58, pages: 1306–1315, 2010.

W. Bradley Knox and Peter Stone, Interactively shaping agents via human reinforcement: The TAMER framework, in Fifth International Conference on Knowledge Capture, California (USA), September 2009.

W. Bradley Knox and Peter Stone, Reinforcement learning with human feedback in mountain car, in AAAI Spring 2011 Symposium on Bridging the Gaps in Human-Agent Collaboration, 2011.

Andrew Y. Ng and Stuart Russell, Algorithms for inverse reinforcement learning, in Proceedings of the Seventeenth International Conference on Machine Learning, 2000.

Maass W., Natschlaeger T. and Markram H. Real-time computing without stable states: A new framework for neural computation based on perturbations, in Neural Computation, 14(11):2531-2560, 2002.

Brenna D. Argall, Sonia Chernova, Manuela Veloso and Brett Browning. A survey of robot learning from demonstration, in Robotics and Autonomous Systems, 57(5):469-483, 2009.

Andrea Lockerd Thomaz, Guy Hoffman and Cynthia Breazeal. Realtime interactive reinforcement learning for robots, in AAAI Workshop on Human Comprehensible Machine Learning, 2005.

T. Kollar and N. Roy. Using reinforcement learning to improve exploration trajectories for error minimization, in IEEE International Conference on Robotics and Automation, 3338-3343, may 2006.

DOI: https://doi.org/10.14198/JoPha.2013.7.1.08