Matrix Approximation and Projective Clustering via Iterative Sampling

Unknown author (2005-03-29)

We present two new results for the problem of approximating a given real m by n matrix A by a rank-k matrix D, where k < min{m, n}, so as to minimize ||A-D||_F^2. It is known that bysampling O(k/eps) rows of the matrix, one can find a low-rank approximation with additive error eps||A||_F^2. Our first result shows that with adaptive sampling in t rounds and O(k/eps) samples in each round, the additive error drops exponentially as eps^t; the computation time is nearly linear in the number of nonzero entries. This demonstrates that multiple passes can be highly beneficial for a natural (and widely studied) algorithmic problem. Our second result is that there exists a subset of O(k^2/eps) rows such that their span contains a rank-k approximation with multiplicative (1+eps) error (i.e., the sum of squares distance has a small \"core-set\" whose span determines a good approximation). This existence theorem leads to a PTAS for the following projective clustering probl! em: Given a set of points P in R^d, and integers k,j, find a set of j subspaces F_1,...,F_j, each of dimension at most k, that minimize \\sum_{p \\in P} min_i d(p,F_i)^2.