Artigo Revisado por pares

A general, prediction error‐based criterion for selecting model complexity for high‐dimensional survival models

2010; Wiley; Volume: 29; Issue: 7-8 Linguagem: Inglês

10.1002/sim.3765

ISSN

1097-0258

Autores

Christine Porzelius, Martin Schumacher, Harald Binder,

Tópico(s)

RNA Research and Splicing

Resumo

Abstract When fitting predictive survival models to high‐dimensional data, an adequate criterion for selecting model complexity is needed to avoid overfitting. The complexity parameter is typically selected by the predictive partial log‐likelihood (PLL) estimated via cross‐validation. As an alternative criterion, we propose a relative version of the integrated prediction error curve (IPEC), which can be stably estimated via bootstrap resampling. The IPEC has the advantage of being applicable for models and fitting techniques where the PLL is not available. To investigate the performance of this new criterion, a simulation study is carried out, mimicking microarray survival data. Additionally, model selection by predictive PLL, estimated via bootstrap resampling instead of cross‐validation, is examined. It is seen that this mostly results in similar prediction performance of the selected models, compared to estimates based on cross‐validation. Model selection by bootstrap estimates of the IPEC performs about as well as selection by cross‐validation estimates of the PLL. Therefore, it is expected to be a reasonable alternative in cases where there is no PLL. Similar results are seen in the analysis of a microarray survival data set from patients with diffuse large‐B‐cell lymphoma. Copyright © 2010 John Wiley & Sons, Ltd.

Referência(s)
Altmetric
PlumX