MzansiLM: A Language Model for All 11 South African Official Languages

University of Cape Town researchers released MzansiLM, a 125M-parameter open model trained on a 3.8-billion-token corpus covering all 11 official South African languages. It reaches 20.65 BLEU on isiXhosa text generation, competing with models 10x its size.
artificial-intelligence
Author

Kabui, Charles

Published

2026-05-28

Keywords

mzansilm, south-african-languages, low-resource-nlp, multilingual-ai, decoder-only