Learning Sequences of Policies by using an Intrinsically Motivated Learner and a Task Hierarchy

Nicolas Duminy; Alexandre Manoury; Sao Mai Nguyen; Cédric Buche; Dominique Duhaut

Communication Dans Un Congrès Année : 2018

Learning Sequences of Policies by using an Intrinsically Motivated Learner and a Task Hierarchy

(1, 2) , (3, 4) , (3, 4) , (5) , (6, 2)

1
2
3
4
5
6

Nicolas Duminy

Fonction : Auteur
PersonId : 173384
IdHAL : nicolas-duminy
ORCID : 0000-0002-8360-6635

Lab-STICC_UBS_CID_IHSEV

Université de Bretagne Sud

Alexandre Manoury

Fonction : Auteur

Lab-STICC_IMTA_CID_IHSEV

Département Informatique

Sao Mai Nguyen

Fonction : Auteur
PersonId : 10486
IdHAL : sao-mai-nguyen
ORCID : 0000-0003-0929-0019
IdRef : 177417285

Lab-STICC_IMTA_CID_IHSEV

Département Informatique

Cédric Buche

Fonction : Auteur
PersonId : 10654
IdHAL : cedric-buche
IdRef : 096138106

Lab-STICC_ENIB_CID_IHSEV

Dominique Duhaut

Fonction : Auteur
PersonId : 873927
IdHAL : dominique-duhaut

Lab-STICC_UBS_CID_IHSEV

Université de Bretagne Sud

Résumé

Our goal is to propose an algorithm for robots to learn sequences of actions, also called policies, in order to achieve complex tasks. We consider in this paper multiple and hierarchical tasks of various difficulties. To tackle this highly dimensional learning we propose a new algorithm, named Socially Guided Intrinsic Motivation for Sequence of Actions through Hierarchical Tasks (SGIM-SAHT), based on intrinsic motivation and using different learning strategies. We then present two implementations of this algorithm designed to address this challenge in different ways: through a "procedures" framework for Socially Guided Intrinsic Motivation with Procedure Babbling (SGIM-PB) and owing to planning and a dynamic environment representation learning for Continual Hierarchical Intrinsically Motivated Exploration (CHIME). We compare the two implementations and show, through two experiments, how efficiently they learn sequences of actions and dynamically adapt to their environment. We also discuss the benefits of implementing a full unified version of SGIM-SAHT using all the mentioned features of both implementations.

Domaines

Apprentissage [cs.LG] Robotique [cs.RO] Intelligence artificielle [cs.AI]

Fichier principal

icdl-epirob-2018_CameraReady.pdf (476.69 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Alexandre Manoury : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01887073

Soumis le : mercredi 3 octobre 2018-15:48:23

Dernière modification le : mercredi 7 février 2024-08:57:27

Archivage à long terme le : vendredi 4 janvier 2019-14:49:07

Dates et versions

hal-01887073 , version 1 (03-10-2018)

Identifiants

HAL Id : hal-01887073 , version 1

Citer

Nicolas Duminy, Alexandre Manoury, Sao Mai Nguyen, Cédric Buche, Dominique Duhaut. Learning Sequences of Policies by using an Intrinsically Motivated Learner and a Task Hierarchy. Workshop on Continual Unsupervised Sensorimotor Learning, ICDL-EpiRob 2018, Sep 2018, Tokyo, Japan. ⟨hal-01887073⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-BREST INSTITUT-TELECOM CNRS LAB-STICC_UBO UBS LAB-STICC_UBS ENIB LAB-STICC_ENIB LAB-STICC IMT-ATLANTIQUE LAB-STICC_UBS_2

153 Consultations

77 Téléchargements

Learning Sequences of Policies by using an Intrinsically Motivated Learner and a Task Hierarchy

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager