Human Document Classification Using Bags of Words

Human Document Classification Using Bags of Words

dc.date.accessioned	2006-08-14T12:29:13Z
dc.date.accessioned	2018-11-24T10:24:58Z
dc.date.available	2006-08-14T12:29:13Z
dc.date.available	2018-11-24T10:24:58Z
dc.date.issued	2006-08-09
dc.identifier.uri	http://hdl.handle.net/1721.1/33789
dc.identifier.uri	http://repository.aust.edu.ng/xmlui/handle/1721.1/33789
dc.description.abstract	Humans are remarkably adept at classifying text documents into cate-gories. For instance, while reading a news story, we are rapidly able to assess whether it belongs to the domain of finance, politics or sports. Automating this task would have applications for content-based search or filtering of digital documents. To this end, it is interesting to investigate the nature of information humans use to classify documents. Here we report experimental results suggesting that this information might, in fact, be quite simple. Using a paradigm of progressive revealing, we determined classification performance as a function of number of words. We found that subjects are able to achieve similar classification accuracy with or without syntactic information across a range of passage sizes. These results have implications for models of human text-understanding and also allow us to estimate what level of performance we can expect, in principle, from a system without requiring a prior step of complex natural language processing.
dc.format.extent	7 p.
dc.format.extent	1617611 bytes
dc.format.extent	134084 bytes
dc.language.iso	en_US
dc.subject	text classification
dc.title	Human Document Classification Using Bags of Words

Files in this item

Files	Size	Format	View
MIT-CSAIL-TR-2006-054.pdf	134.0Kb	application/pdf	View/Open
MIT-CSAIL-TR-2006-054.ps	1.617Mb	application/postscript	View/Open

This item appears in the following Collection(s)

Computer Science and Artificial Intelligence Lab (CSAIL)2625

Show simple item record

Human Document Classification Using Bags of Words

Files in this item

This item appears in the following Collection(s)

Computer Science and Artificial Intelligence Lab (CSAIL)2625