Skip to main content

Robust Maximum Entropy Behavior Cloning

Research Authors
Mostafa Hussein, Brendan Crowe, Marek Petrik, Momotaz Begum
Research Department
Research Date
Research Year
2021
Research Journal
arXiv preprint arXiv:2101.01251
Research Publisher
arXiv preprint arXiv:2101.01251
Research_Pages
arXiv preprint arXiv:2101.01251
Research Website
https://arxiv.org/abs/2101.01251
Research Abstract
Imitation learning (IL) algorithms use expert demonstrations to learn a specific task. Most of the existing approaches assume that all expert demonstrations are reliable and trustworthy, but what if there exist some adversarial demonstrations among the given data-set? This may result in poor decision-making performance. We propose a novel general frame-work to directly generate a policy from demonstrations that autonomously detect the adversarial demonstrations and exclude them from the data set. At the same time, it's sample, time-efficient, and does not require a simulator. To model such adversarial demonstration we propose a min-max problem that leverages the entropy of the model to assign weights for each demonstration. This allows us to learn the behavior using only the correct demonstrations or a mixture of correct demonstrations.