Show simple item record

Semi-automatic matching of semi-structured data updates

dc.contributor.advisorBerman, Soniaen_ZA
dc.contributor.authorForshaw,Gareth Williamen_ZA
dc.date.accessioned2015-05-27T04:11:15Z
dc.date.accessioned2018-11-26T13:53:36Z
dc.date.available2015-05-27T04:11:15Z
dc.date.available2018-11-26T13:53:36Z
dc.date.issued2014en_ZA
dc.identifier.urihttp://hdl.handle.net/11427/12930
dc.identifier.urihttp://repository.aust.edu.ng/xmlui/handle/11427/12930
dc.descriptionIncludes bibliographical references.en_ZA
dc.description.abstractData matching, also referred to as data linkage or field matching, is a technique used to combine multiple data sources into one data set. Data matching is used for data integration in a number of sectors and industries; from politics and health care to scientific applications. The motivation for this study was the observation of the day-to-day struggles of a large non-governmental organisation (NGO) in managing their membership database. With a membership base of close to 2.4 million, the challenges they face with regard to the capturing and processing of the semi-structured membership updates are monumental. Updates arrive from the field in a multitude of formats, often incomplete and unstructured, and expert knowledge is geographically localised. These issues are compounded by an extremely complex organisational hierarchy and a general lack of data validation processes. An online system was proposed for pre-processing input and then matching it against the membership database. Termed the Data Pre-Processing and Matching System (DPPMS), it allows for single or bulk updates. Based on the success of the DPPMS with the NGO’s membership database, it was subsequently used for pre-processing and data matching of semi-structured patient and financial customer data. Using the semi-automated DPPMS rather than a clerical data matching system, true positive matches increased by 21% while false negative matches decreased by 20%. The Recall, Precision and F-Measure values all improved and the risk of false positives diminished. The DPPMS was unable to match approximately 8% of provided records; this was largely due to human error during initial data capture. While the DPPMS greatly diminished the reliance on experts, their role remained pivotal during the final stage of the process.en_ZA
dc.language.isoengen_ZA
dc.subject.otherInformation Technologyen_ZA
dc.titleSemi-automatic matching of semi-structured data updatesen_ZA
dc.type.qualificationlevelMastersen_ZA
dc.type.qualificationnameMScen_ZA
dc.publisher.institutionUniversity of Cape Town
dc.publisher.facultyFaculty of Scienceen_ZA
dc.publisher.departmentDepartment of Computer Scienceen_ZA


Files in this item

FilesSizeFormatView
thesis_sci_2014_forshaw_gw.pdf1.458Mbapplication/pdfView/Open

This item appears in the following Collection(s)

Show simple item record