Show simple item record

Mixed-Language Arabic- English Information Retrieval

dc.contributor.advisorSuleman, Husseinen_ZA
dc.contributor.authorMustafa, Ali Mohammeden_ZA
dc.date.accessioned2014-08-13T19:31:35Z
dc.date.accessioned2018-11-26T13:52:56Z
dc.date.available2014-08-13T19:31:35Z
dc.date.available2018-11-26T13:52:56Z
dc.date.issued2013en_ZA
dc.identifier.urihttp://hdl.handle.net/11427/6421
dc.identifier.urihttp://repository.aust.edu.ng/xmlui/handle/11427/6421
dc.descriptionIncludes abstract.en_ZA
dc.description.abstractThis thesis attempts to address the problem of mixed querying in CLIR. It proposes mixed-language (language-aware) approaches in which mixed queries are used to retrieve most relevant documents, regardless of their languages. To achieve this goal, however, it is essential firstly to suppress the impact of most problems that are caused by the mixed-language feature in both queries and documents and which result in biasing the final ranked list. Therefore, a cross-lingual re-weighting model was developed. In this cross-lingual model, term frequency, document frequency and document length components in mixed queries are estimated and adjusted, regardless of languages, while at the same time the model considers the unique mixed-language features in queries and documents, such as co-occurring terms in two different languages. Furthermore, in mixed queries, non-technical terms (mostly those in non-English language) would likely overweight and skew the impact of those technical terms (mostly those in English) due to high document frequencies (and thus low weights) of the latter terms in their corresponding collection (mostly the English collection). Such phenomenon is caused by the dominance of the English language in scientific domains. Accordingly, this thesis also proposes reasonable re-weighted Inverse Document Frequency (IDF) so as to moderate the effect of overweighted terms in mixed queries.en_ZA
dc.language.isoengen_ZA
dc.subject.otherComputer Scienceen_ZA
dc.titleMixed-Language Arabic- English Information Retrievalen_ZA
dc.typeThesisen_ZA
dc.type.qualificationlevelDoctoralen_ZA
dc.type.qualificationnamePhDen_ZA
dc.publisher.institutionUniversity of Cape Town
dc.publisher.facultyFaculty of Scienceen_ZA
dc.publisher.departmentDepartment of Computer Scienceen_ZA


Files in this item

FilesSizeFormatView
thesis_sci_2013_ali_mohammed_mustafa.pdf3.218Mbapplication/pdfView/Open

This item appears in the following Collection(s)

Show simple item record