Show simple item record

Multiscale Geometric Methods for Data Sets I: Multiscale SVD, Noise and Curvature

dc.date.accessioned2012-09-10T18:00:08Z
dc.date.accessioned2018-11-26T22:26:53Z
dc.date.available2012-09-10T18:00:08Z
dc.date.available2018-11-26T22:26:53Z
dc.date.issued2012-09-08
dc.identifier.urihttp://hdl.handle.net/1721.1/72597
dc.identifier.urihttp://repository.aust.edu.ng/xmlui/handle/1721.1/72597
dc.description.abstractLarge data sets are often modeled as being noisy samples from probability distributions in R^D, with D large. It has been noticed that oftentimes the support M of these probability distributions seems to be well-approximated by low-dimensional sets, perhaps even by manifolds. We shall consider sets that are locally well approximated by k-dimensional planes, with k << D, with k-dimensional manifolds isometrically embedded in R^D being a special case. Samples from this distribution; are furthermore corrupted by D-dimensional noise. Certain tools from multiscale geometric measure theory and harmonic analysis seem well-suited to be adapted to the study of samples from such probability distributions, in order to yield quantitative geometric information about them. In this paper we introduce and study multiscale covariance matrices, i.e. covariances corresponding to the distribution restricted to a ball of radius r, with a fixed center and varying r, and under rather general geometric assumptions we study how their empirical, noisy counterparts behave. We prove that in the range of scales where these covariance matrices are most informative, the empirical, noisy covariances are close to their expected, noiseless counterparts. In fact, this is true as soon as the number of samples in the balls where the covariance matrices are computed is linear in the intrinsic dimension of M. As an application, we present an algorithm for estimating the intrinsic dimension of M.en_US
dc.format.extent59 p.en_US
dc.subjectmachine learningen_US
dc.subjecthigh dimensional dataen_US
dc.titleMultiscale Geometric Methods for Data Sets I: Multiscale SVD, Noise and Curvatureen_US


Files in this item

FilesSizeFormatView
MIT-CSAIL-TR-2012-029.pdf1.909Mbapplication/pdfView/Open

This item appears in the following Collection(s)

Show simple item record