Random Forests for Regression as a Weighted Sum of ${k}$ -Potential Nearest Neighbors
2019; Institute of Electrical and Electronics Engineers; Volume: 7; Linguagem: Inglês
10.1109/access.2019.2900755
ISSN2169-3536
AutoresPablo Fernández-González, Concha Bielza, Pedro Larrañaga,
Tópico(s)Advanced Statistical Methods and Models
ResumoIn this paper, we tackle the problem of random forests for regression expressed as weighted sums of datapoints. We study the theoretical behavior of ${k}$ -potential nearest neighbors ( ${k}$ -PNNs) under bagging and obtain an upper bound on the weights of a datapoint for random forests with any type of splitting criterion, provided that we use unpruned trees that stop growing only when there are $k$ or less datapoints at their leaves. Moreover, we use the previous bound together with the concept of b-terms (i.e., bootstrap terms) introduced in this paper, to derive the explicit expression of weights for datapoints in a random ( ${k}$ -PNNs) selection setting, a datapoint selection strategy that we also introduce and to build a framework to derive other bagged estimators using a similar procedure. Finally, we derive from our framework the explicit expression of weights of a regression estimate equivalent to a random forest regression estimate with the random splitting criterion and demonstrate its equivalence both theoretically and practically.
Referência(s)