Improving Multi-class Text Classification with Naive Bayes

Improving Multi-class Text Classification with Naive Bayes

dc.date.accessioned	2004-10-20T20:28:16Z
dc.date.accessioned	2018-11-24T10:22:58Z
dc.date.available	2004-10-20T20:28:16Z
dc.date.available	2018-11-24T10:22:58Z
dc.date.issued	2001-09-01	en_US
dc.identifier.uri	http://hdl.handle.net/1721.1/7074
dc.identifier.uri	http://repository.aust.edu.ng/xmlui/handle/1721.1/7074
dc.description.abstract	There are numerous text documents available in electronic form. More and more are becoming available every day. Such documents represent a massive amount of information that is easily accessible. Seeking value in this huge collection requires organization; much of the work of organizing documents can be automated through text classification. The accuracy and our understanding of such systems greatly influences their usefulness. In this paper, we seek 1) to advance the understanding of commonly used text classification techniques, and 2) through that understanding, improve the tools that are available for text classification. We begin by clarifying the assumptions made in the derivation of Naive Bayes, noting basic properties and proposing ways for its extension and improvement. Next, we investigate the quality of Naive Bayes parameter estimates and their impact on classification. Our analysis leads to a theorem which gives an explanation for the improvements that can be found in multiclass classification with Naive Bayes using Error-Correcting Output Codes. We use experimental evidence on two commonly-used data sets to exhibit an application of the theorem. Finally, we show fundamental flaws in a commonly-used feature selection algorithm and develop a statistics-based framework for text feature selection. Greater understanding of Naive Bayes and the properties of text allows us to make better use of it in text classification.	en_US
dc.format.extent	49 p.	en_US
dc.format.extent	2017370 bytes
dc.format.extent	687421 bytes
dc.language.iso	en_US
dc.subject	AI	en_US
dc.subject	naive bayes	en_US
dc.subject	text	en_US
dc.subject	classification	en_US
dc.subject	feature selection	en_US
dc.title	Improving Multi-class Text Classification with Naive Bayes	en_US

Files in this item

Files	Size	Format	View
AITR-2001-004.pdf	687.4Kb	application/pdf	View/Open
AITR-2001-004.ps	2.017Mb	application/postscript	View/Open

This item appears in the following Collection(s)

Computer Science and Artificial Intelligence Lab (CSAIL)2625

Show simple item record

Improving Multi-class Text Classification with Naive Bayes

Files in this item

This item appears in the following Collection(s)

Computer Science and Artificial Intelligence Lab (CSAIL)2625