Journal article
International Journal on Natural Language Computing, 2019
APA
Click to copy
Malema, G., Okgetheng, B., Motlhanka, M., & Rammidi, G. (2019). Auto Correction of Setswana Real-word Errors. International Journal on Natural Language Computing.
Chicago/Turabian
Click to copy
Malema, G., Boago Okgetheng, Moffat Motlhanka, and Goaletsa Rammidi. “Auto Correction of Setswana Real-Word Errors.” International Journal on Natural Language Computing (2019).
MLA
Click to copy
Malema, G., et al. “Auto Correction of Setswana Real-Word Errors.” International Journal on Natural Language Computing, 2019.
BibTeX Click to copy
@article{g2019a,
title = {Auto Correction of Setswana Real-word Errors},
year = {2019},
journal = {International Journal on Natural Language Computing},
author = {Malema, G. and Okgetheng, Boago and Motlhanka, Moffat and Rammidi, Goaletsa}
}
Spell checkers are used to detect and where possible correct spelling errors. Errors are classified as nonword errors and real-word errors. Real-word errors require the consideration of the context of the sentence to detect and correct. Setswana language has several commonly used words which are often misspelled by either separating or merging them. The misspelling results in real-word errors. In this paper we propose contextual rules that look at neighbor words to determine whether the correct word is written as two separate words or merged as one word. For some words the rules require that the parts of speech category of neighbor words be determined whereas some depend on specific neighbor words or position in a sentence. Implemented rules show that the rules are very consistent with a 88% success rate. Our tool only looks at neighbor words and therefore does not look at the context of the whole sentence. Hence, for words that require context of the whole sentence to disambiguate correctly our rules fail. This module can be incorporated into a spell checker to detect and correct real world errors for some words. That is, help users to determine the correct orthography of certain words.