Skip to main navigation Skip to search Skip to main content

Learning social affordance grammar from videos: Transferring human interactions to human-robot interactions

  • University of California at Los Angeles
  • Fudan University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

35 Scopus citations

Abstract

In this paper, we present a general framework for learning social affordance grammar as a spatiotemporal AND-OR graph (ST-AOG) from RGB-D videos of human interactions, and transfer the grammar to humanoids to enable a real-time motion inference for human-robot interaction (HRI). Based on Gibbs sampling, our weakly supervised grammar learning can automatically construct a hierarchical representation of an interaction with long-term joint sub-tasks of both agents and short term atomic actions of individual agents. Based on a new RGB-D video dataset with rich instances of human interactions, our experiments of Baxter simulation, human evaluation, and real Baxter test demonstrate that the model learned from limited training data successfully generates human-like behaviors in unseen scenarios and outperforms both baselines.

Original languageEnglish
Title of host publicationICRA 2017 - IEEE International Conference on Robotics and Automation
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1669-1676
Number of pages8
ISBN (Electronic)9781509046331
DOIs
StatePublished - Jul 21 2017
Event2017 IEEE International Conference on Robotics and Automation, ICRA 2017 - Singapore, Singapore
Duration: May 29 2017Jun 3 2017

Publication series

NameProceedings - IEEE International Conference on Robotics and Automation
ISSN (Print)1050-4729

Conference

Conference2017 IEEE International Conference on Robotics and Automation, ICRA 2017
Country/TerritorySingapore
CitySingapore
Period05/29/1706/3/17

Fingerprint

Dive into the research topics of 'Learning social affordance grammar from videos: Transferring human interactions to human-robot interactions'. Together they form a unique fingerprint.

Cite this