Regret bounds for sleeping experts and bandits

Artigo Acesso aberto Revisado por pares

Regret bounds for sleeping experts and bandits

2010; Springer Science+Business Media; Volume: 80; Issue: 2-3 Linguagem: Inglês

10.1007/s10994-010-5178-7

ISSN

1573-0565

Autores

Robert Kleinberg, Alexandru Niculescu-Mizil, Yogeshwer Sharma,

Tópico(s)

Machine Learning and Algorithms

Resumo

We study on-line decision problems where the set of actions that are available to the decision algorithm varies over time. With a few notable exceptions, such problems remained largely unaddressed in the literature, despite their applicability to a large number of practical problems. Departing from previous work on this “Sleeping Experts” problem, we compare algorithms against the payoff obtained by the best ordering of the actions, which is a natural benchmark for this type of problem. We study both the full-information (best expert) and partial-information (multi-armed bandit) settings and consider both stochastic and adversarial rewards models. For all settings we give algorithms achieving (almost) information-theoretically optimal regret bounds (up to a constant or a sub-logarithmic factor) with respect to the best-ordering benchmark.

Ver no editor

Altmetric

PlumX

Entrar

Lembrar minha senha

Receber meu e-mail de confirmação

Regret bounds for sleeping experts and bandits