Language defines the world and how we see it. It never is neutral, but can be a powerful tool to generate inequality. With the project Artificially Correct, the Goethe-Institut has since the end of 2020 tried to make language use more inclusive by both questioning and using AI. With the aim of developing an AI-based tool together with experts, and by building up a network of translators, activists, developers and scientists, the project wants to contribute to the minimization of biases in texts, and strengthen a conscious approach to language.
As an organization present in 98 countries, language and translation plays a crucial role in everyday working life at the Goethe-Institut. Not only are the websites at least bilingual, but also communication with partners, artists, students, or visitors happens in many different languages. Translation and cooperation with translators and interpreters often is crucial. But while working on topics like identity and anti-racism, we realized that we need to educate both ourselves and our audiences regarding questions such as how we use language, and how we can avoid biases in both our mother tongues and foreign languages. And, based on these questions, how can we identify, tackle and minimize biases in texts, translations and translation tools?
First, we saw the need of awareness raising on the topic of biases in language. In cooperation with the Berlin-based macht.sprache, a project to foster politically sensitive translation between English and German, we published articles about terms related to race, identity, and gender and sexuality that require sensitivity in translation. Additionally, we organized events to provide translators the opportunity to work together on their use of language, since also translators are not, and cannot, be always aware of their biases. They appreciate the help of colleagues and experts to find or create suitable terms in their own languages.
Widely used translation tools are of no help here, either: when language is biased, also translations tools are, since they are trained off data sets that contain human bias.
Artificially correct, therefore, set its focus on AI-based translation tools and the biases their translations generate. But instead of condemning translation tools that are based on systems of artificial intelligence, the project’s aim is to investigate the potential of AI to minimize bias. To this end, we initiated an online-hackathon in 2021 to find solutions that help to make AI-based language tools more inclusive. The results of the hackathon with 12 teams and different expertise from all over the world were stunning. And it was amazing to see how many people are willing to spend precious time to tackle issues of inequality. The two winning projects that were selected by a jury have since then developed their ideas and solutions:
The team of the project DeBiasByUs aims to raise awareness of and create a database for machine translation gender bias by creating a public platform that provides information on the topic, on the impact of biases on society, and on the latest research. On their website, users can add biased sentences and receive either new solutions or add their own unbiased version. Besides the platform, also a plug-in that is linked to the website might be developed in the future.
The second winning team uses a Word2Vec solution with the aim to fight bias, sexism and racism. They have trained a machine learning model that scans texts and highlights words that might be biased. Based on this model, they have created a web portal where users can enter texts in English and see biased words highlighted. The flagged words are the basis for a data set that minimizes bias in translation tools.
Both projects will still be developed further and the results will be updated and linked on the project website.
AI can be a useful tool for inclusion if it is used with the right intention, as many of the hackathon teams and their ideas have shown. Yet, as one of the project leaders I have also realized that the deeper you dive into the topic, the better you see the variety of challenges still ahead. The journey to any kind of “unbiased” machine learning is long, since there are many issues that need to be tackled in the near future. Questions like which data is used for machine learning, or who provides the data are more and more coming to the fore in international discussions and projects. However, most of the data used for machine learning comes from the global North and therefore only represents a small part of the global reality. And while most of the people working with the development of AI still are white men, the sourcing of the raw materials for example happens in poorer countries and under bad working conditions: diversity and equality in AI are a big issue that everyone working in the field needs to be made aware of. Projects like Artificially correct can hopefully contribute to solving this issue, since equality also starts with a conscious and considerate use of language and language tools that are available. If you want to dive deeper into the topic of bias, language and artificial intelligence, you can find the Artificially Correct’s online dossier that compiles different perspectives on how artificial intelligence affects language and text production, and on why discriminatory language must already be addressed in schools.
All images on this page are by EL BOUM/ Goethe-Institut, (Copyright: Goethe-Institut).