Show simple item record

Human Document Classification Using Bags of Words

dc.date.accessioned2006-08-14T12:29:13Z
dc.date.accessioned2018-11-24T10:24:58Z
dc.date.available2006-08-14T12:29:13Z
dc.date.available2018-11-24T10:24:58Z
dc.date.issued2006-08-09
dc.identifier.urihttp://hdl.handle.net/1721.1/33789
dc.identifier.urihttp://repository.aust.edu.ng/xmlui/handle/1721.1/33789
dc.description.abstractHumans are remarkably adept at classifying text documents into cate-gories. For instance, while reading a news story, we are rapidly able to assess whether it belongs to the domain of finance, politics or sports. Automating this task would have applications for content-based search or filtering of digital documents. To this end, it is interesting to investigate the nature of information humans use to classify documents. Here we report experimental results suggesting that this information might, in fact, be quite simple. Using a paradigm of progressive revealing, we determined classification performance as a function of number of words. We found that subjects are able to achieve similar classification accuracy with or without syntactic information across a range of passage sizes. These results have implications for models of human text-understanding and also allow us to estimate what level of performance we can expect, in principle, from a system without requiring a prior step of complex natural language processing.
dc.format.extent7 p.
dc.format.extent1617611 bytes
dc.format.extent134084 bytes
dc.language.isoen_US
dc.subjecttext classification
dc.titleHuman Document Classification Using Bags of Words


Files in this item

FilesSizeFormatView
MIT-CSAIL-TR-2006-054.pdf134.0Kbapplication/pdfView/Open
MIT-CSAIL-TR-2006-054.ps1.617Mbapplication/postscriptView/Open

This item appears in the following Collection(s)

Show simple item record