Multiple Linear Regression Models in Outlier Detection

2012; Volume: 2; Issue: 2 Linguagem: Inglês

10.7815/ijorcs.22.2012.018

ISSN

2249-8265

Autores

S.M.A.Khaleelur Rahman, M. Mohamed Sathik, K. Senthamarai Kannan,

Tópico(s)

Imbalanced Data Classification Techniques

Resumo

Identifying anomalous values in the real-world databases is important both for improving the quality of original data and for reducing the impact of anomalous values in the process of knowledge discovery in databases. Such anomalous values give useful information to the data analyst in discovering useful patterns. Through isolation, these data may be separated and analyzed. The analysis of outliers and influential points is an important step of the regression diagnostics. In this paper, our aim is to detect the points which are very different from the others points. They do not seem to belong to a particular population and behave differently. If these influential points are to be removed it will lead to a different model. Distinction between these points is not always obvious and clear. Hence several indicators are used for identifying and analyzing outliers. Existing methods of outlier detection are based on manual inspection of graphically represented data. In this paper, we present a new approach in automating the process of detecting and isolating outliers. Impact of anomalous values on the dataset has been established by using two indicators DFFITS and Cook’sD. The process is based on modeling the human perception of exceptional values by using multiple linear regression analysis..

Referência(s)