Show simple item record

Lexical Chains and Sliding Locality Windows in Content-based Text Similarity Detection

dc.date.accessioned2005-12-22T02:29:37Z
dc.date.accessioned2018-11-24T10:24:30Z
dc.date.available2005-12-22T02:29:37Z
dc.date.available2018-11-24T10:24:30Z
dc.date.issued2005-05-19
dc.identifier.urihttp://hdl.handle.net/1721.1/30546
dc.identifier.urihttp://repository.aust.edu.ng/xmlui/handle/1721.1/30546
dc.description.abstractWe present a system to determine content similarity of documents. More specifically, our goal is to identify book chapters that are translations of the same original chapter; this task requires identification of not only the different topics in the documents but also the particular flow of these topics. We experiment with different representations employing n-grams of lexical chains and test these representations on a corpus of approximately 1000 chapters gathered from books with multiple parallel translations. Our representations include the cosine similarity of attribute vectors of n-grams of lexical chains, the cosine similarity of tf*idf-weighted keywords, and the cosine similarity of unweighted lexical chains (unigrams of lexical chains) as well as multiplicative combinations of the similarity measures produced by these approaches. Our results identify fourgrams of unordered lexical chains as a particularly useful representation for text similarity evaluation.
dc.format.extent9 p.
dc.format.extent17827888 bytes
dc.format.extent7011726 bytes
dc.language.isoen_US
dc.subjectAI
dc.subjectNatural Language Processing
dc.subjectN-grams
dc.subjectText Similarity
dc.subjectLexical Chains
dc.titleLexical Chains and Sliding Locality Windows in Content-based Text Similarity Detection


Files in this item

FilesSizeFormatView
MIT-CSAIL-TR-2005-034.pdf7.011Mbapplication/pdfView/Open
MIT-CSAIL-TR-2005-034.ps17.82Mbapplication/postscriptView/Open

This item appears in the following Collection(s)

Show simple item record