Artigo Acesso aberto Revisado por pares

Random Forests for Regression as a Weighted Sum of ${k}$ -Potential Nearest Neighbors

2019; Institute of Electrical and Electronics Engineers; Volume: 7; Linguagem: Inglês

10.1109/access.2019.2900755

ISSN

2169-3536

Autores

Pablo Fernández-González, Concha Bielza, Pedro Larrañaga,

Tópico(s)

Advanced Statistical Methods and Models

Resumo

In this paper, we tackle the problem of random forests for regression expressed as weighted sums of datapoints. We study the theoretical behavior of ${k}$ -potential nearest neighbors ( ${k}$ -PNNs) under bagging and obtain an upper bound on the weights of a datapoint for random forests with any type of splitting criterion, provided that we use unpruned trees that stop growing only when there are $k$ or less datapoints at their leaves. Moreover, we use the previous bound together with the concept of b-terms (i.e., bootstrap terms) introduced in this paper, to derive the explicit expression of weights for datapoints in a random ( ${k}$ -PNNs) selection setting, a datapoint selection strategy that we also introduce and to build a framework to derive other bagged estimators using a similar procedure. Finally, we derive from our framework the explicit expression of weights of a regression estimate equivalent to a random forest regression estimate with the random splitting criterion and demonstrate its equivalence both theoretically and practically.

Referência(s)