A Research Team of Investigators, including the Principal Investigator, from Maseno University’s Departments of Computer Science and Kiswahili and Other African Languages was awarded Research Funds from the LACUNA FUND for a Project titled, KenCorpus: Kenyan Languages Corpus. The Project is in collaboration with Investigators from University of Nairobi and Africa Nazarene University. The researchers shall be creating datasets for Kiswahili, Dholuo, and selected Luluhyia Dialects for eventual Annotation so that these languages that are classified as underserved are enhanced in presence among those that can be applied in machine learning.
This project will have a great impact on the methodologies used in the rapid assembly of under-resourced languages corpora and shed light on how to prepare and annotate speech and texts for use in multilingual communities. Upcoming technology firms interested in human language technology solutions are also going to benefit from this project because they will see pioneer prototypes that could inspire commercial systems thereby motivating co-funding and probable cooperation in future projects.
The aim of this Research Project will be to among others; collect natural-occuring language texts in Kiswahili, Dholuo and Luhyia, to Collect speech data for Kiswahili, Dholuo and Luhyia Languages, to translate the DHoluo and Luhyia texts into Kiswahili for Machine Translation, to collect questions and answer pairs for the Kiswahili texts for Machine Comprehension and to annotate the Kiswahili, Dholuo and Luhyia texts with Part of Speech tags.