UCT NLP
University of Cape Town Natural Language Processing Group
We perform research in natural language processing and machine learning. Research topics include:
- Methods for text generation and language model training for low-resource languages
- Approaches for modelling linguistic structure in text
- Creating NLP datasets and models for South African languages
Principle Investigator
Dr. Jan BuysCurrent postgraduate students
- Francois Meyer (PhD)
- Sello Ralethe (PhD)
- Nomonde Khalo (PhD)
- Claytone Sikasote (PhD)
- Natalie Bianca Alexander (MSc)
- Khalid N. Elmadani (MSc)
Past students
- Neil Sinclair (MSc)
- Shane Acton (MSc)
- Victoria Pedlar (MSc)
- Yassin Nurmahomed (MSc)
- Maxwell Mojapelo (MPhil)
Publications
-
A Systematic Analysis of Subwords and Cross-Lingual Transfer in Multilingual Translation.
Francois Meyer and Jan Buys.
NAACL Findings 2024.
-
NGLUEni: Benchmarking and Adapting Pretrained Language Models for Nguni Languages.
Francois Meyer, Haiyue Song, Abhisek Chakrabarty, Jan Buys, Raj Dabre and Hideki Tanaka.
LREC-COLING 2024. [Data]
-
Triples-to-isiXhosa (T2X): Addressing the Challenges of Low-Resource Agglutinative Data-to-Text Generation.
Francois Meyer and Jan Buys.
LREC-COLING 2024. [Data]
-
Neural Machine Translation between Low-Resource Languages with Synthetic Pivoting.
Khalid N. Elmadani and Jan Buys.
LREC-COLING 2024.
-
Subword Segmental Machine Translation: Unifying Segmentation and Target Sentence Generation.
Francois Meyer and Jan Buys.
ACL Findings 2023. [Code]
-
Data Augmentation for Low Resource Neural Machine Translation for Sotho-Tswana Languages.
Maxwell Mojapelo and Jan Buys.
SACAIR 2023.
- Policy-based Reinforcement Learning for Generalisation in Interactive Text-based Environments.
Edan Toledo, Jan Buys and Jonathan Shock.
EACL 2023. [Code]
-
Subword Segmental Language Modelling for Nguni Languages.
Francois Meyer and Jan Buys.
EMNLP Findings 2022. [Code]
-
University of Cape Town’s WMT22 System: Multilingual Machine Translation for Southern African Languages.
Khalid N. Elmadani, Francois Meyer and Jan Buys.
WMT 2022. [Model]
-
Self-Supervised Text Style Transfer with Rationale Prediction and Pretrained Transformers.
Neil Sinclair and Jan Buys.
SACAIR 2022 (CCIS). [Version of Record]
-
From GNNs to Sparse Transformers: Graph-based architectures for Multi-hop Question Answering.
Shane Acton and Jan Buys.
SACAIR 2022 (CCIS). [Version of Record]
-
Generic Overgeneralization in Pre-trained Language Models.
Sello Ralethe and Jan Buys.
COLING 2022. [Data]
-
A Sequence Modelling Approach to Question Answering in Text-Based Games.
Greg Furman, Edan Toledo, Jonathan Shock and Jan Buys.
Wordplay 2022. [Code]
-
Canonical and Surface Morphological Segmentation for Nguni Languages.
Tumi Moeng, Sheldon Reay, Aaron Daniels and Jan Buys.
SACAIR 2021 (CCIS). [Version of Record] [Code]
-
Low-Resource Language Modelling of South African Languages.
Stuart Mesham, Luc Hayward, Jared Shapiro and Jan Buys.
SACAIR 2021. [Code]
-
RepGraph: Visualising and Analysing Meaning Representation Graphs.
Jaron Cohen, Roy Cohen, Edan Toledo and Jan Buys.
EMNLP 2021 System Demonstrations. [Demo] [Code]