The Bitter Aloe Project develops customized machine learning (ML) models to perform two tasks on archival materials produced by South Africa’s Truth and Reconciliation Commission; (1) named entity recognition (NER), a process that automates the recognition of persons, places, organizations and other user defined entities and (2) word embedding, which reveals deep conceptual patterns within a corpus by rendering words and sentences into comparable mathematical expressions. At present our models can correctly recognize the names individual victims and political organizations, as well as the locations and date of human rights violations and the type of violence employed by perpetrators. Users can access this data using two tools hosted on this website: the TRC v7 Dashboard, a customizable GIS data dashboard, and our Co-occurrence Network Graph which maps the correlations between individual names and organizations across tens of thousands of incident descriptions. In addition we are prototyping a sentence embedding application that allows users to utilize word embeddings to identify deep conceptual clusters and patterns of experience within those same incident descriptions. In the Spring of 2022 we will debut a suite of Youtube tutorials that will provide instruction on the basic functions of each tool.
The TRC was an ambitious attempt to create ‘as complete a picture as possible’ of human rights violations (HRVs) committed by all parties during the latter half of the apartheid era, defined as the years between 1960 and 1994. This effort was part of a broader genealogy of late 20th century truth commissions which sought to stabilize post-conflict societies on a foundation of archival truth produced through the exhaustive collection of substantial evidence and countless testimonies. What made South Africa’s TRC uniquely successful among other commissions was its emphasis on its model of restorative justice which was facilitated by a relatively novel amnesty process. This process permitted perpetrators to apply for amnesty from criminal prosecution on the condition that their actions met the criteria for politically motivated violence, and they provided full and truthful accounts of perpetrators’ actions. The amnesty process was paralleled by public victims hearing, which offered victims a forum for telling their stories, often for the first time.
The TRC, like truth commissions held elsewhere, faced a balancing act between its responsibility to document the totality of past abuses with the task of creating an archive that is both legible, navigable and extensible. This hardcoded bias toward comprehensive collection is not unique to the TRC; it is a common feature of most truth commissions, which collectively interpret post-Holocaust notions of ‘never forgetting’ as leaving no testimony behind. This impulse is understandable but, in this case, and others, it led to the unintended effect of obscuring the broad contours of mass political violence under an archival high tide of segmented narratives. This paradoxically obscured the overall totality of abuses and networks of social connections between victims and perpetrators that it sought to document, disseminate and preserve.
Although the TRC was not immune from criticism, this model nonetheless succeeded in producing a truly massive archive of material that holds the potential to provide an unparalleled window into political violence committed by all sides in a thirty-five-year conflict at a nationwide scale. The tools we have developed unlock this potential by bringing this archive in line with new definitions of public access that rely on machine learning to extract extensible structured datasets. Above and beyond the domain of human rights in South Africa, our methods also are broadly applicable to any large corpus that features multilingual testimony. The ultimate aim of our work is to better inform public debates about the past as well as open new avenues in scholarship on violence during the apartheid era.