Back to all papers

Paper #309

Title:
Principal curves and principal oriented points
Author:
Pedro Delicado
Date:
July 1998
Abstract:
Principal curves have been defined Hastie and Stuetzle (JASA, 1989) as smooth curves passing through the middle of a multidimensional data set. They are nonlinear generalizations of the first principal component, a characterization of which is the basis for the principal curves definition. In this paper we propose an alternative approach based on a different property of principal components. Consider a point in the space where a multivariate normal is defined and, for each hyperplane containing that point, compute the total variance of the normal distribution conditioned to belong to that hyperplane. Choose now the hyperplane minimizing this conditional total variance and look for the corresponding conditional mean. The first principal component of the original distribution passes by this conditional mean and it is orthogonal to that hyperplane. This property is easily generalized to data sets with nonlinear structure. Repeating the search from different starting points, many points analogous to conditional means are found. We call them principal oriented points. When a one-dimensional curve runs the set of these special points it is called principal curve of oriented points. Successive principal curves are recursively defined from a generalization of the total variance.
Keywords:
Fixed points, generalized total variance, nonlinear multivariate analysis, principal components, smoothing techniques
JEL codes:
C10, C14
Area of Research:
Statistics, Econometrics and Quantitative Methods
Published in:
Journal of Multivariate Analysis, 77, 84-116, 2001
With the title:
Another look at principal curves and surfaces

Download the paper in PDF format