Digitising Sabah’s various dialects


Taufik says a two-day event was held in 2023 to teach members of the community how to create entries for the Mendriq dialect on Wiktionary. — ZAHIRULNUKMAN/Wikimedia Commons

THERE are an estimated 137 languages spoken by various ethnic groups in Malaysia. According to a report published last year, 80% of these languages are used by minority groups.

While Malaysia is rich in cultural diversity, this linguistic wealth is often under- represented. Taufik Rosman, treasurer of Wikimedia Community User Group Malaysia (WCUGM), shares that it has been a personal goal of his to bring more Malaysian languages to the digital sphere. He is concerned that if some ethnic languages practised by various communities in Malaysia fail to establish a presence online, they may fade away permanently over time.

“Once these languages and more go online, we believe they have a chance to be around for a longer time,” he says in an interview with LifestyleTech.

For more stories checkout the StarSpecial: Malaysia Day 2024

Taufik proudly shared that on May 28, Wikipedia Kadazandusun successfully graduated from the Wikimedia Incubator – a platform for developing and testing new language versions of Wiki projects – and was launched with over 900 articles.

The project took a long time to launch, as Taufik explains that it needed more volunteers or contributors to consistently create new articles.

“Previously, we only had around three active contributors. Now, we have more contributors from Sabah, where some represent regions that don’t get much attention online,” he says.

Jurina (standing) is assisting volunteers at the Institute of Teacher Education Kent Campus in Tuaran, Sabah with creating new entries for the Kadazandusun Wiki.Jurina (standing) is assisting volunteers at the Institute of Teacher Education Kent Campus in Tuaran, Sabah with creating new entries for the Kadazandusun Wiki.

Meet the contributors

In Tuaran, Sabah, a group of teaching students from the Institute Teacher of Education Kent Campus founded the Kent Wiki Club and started making active contributions for various dialects spoken by the Kadazandusun community on Wiktionary.

“We formed the club and started looking for members to make more contributions in the Kadazandusun dialect to Wiktionary. Eventually, we started translating articles from English and Malay to create entries for Wikipedia Kadazandusun,” says Kadazandusun Language teacher trainee Jurina Jonimin, 22.

Club chairperson Bluster Jainon, 21, says members of the club were driven by the motto “Okon nopo ko yati, isai po?” which translates to “If not us, then who else?”.

“The club was formed in 2022 to spread knowledge about our culture and language,” Bluster says, adding that the main dialect used for Wikipedia Kadazandusun is Dusun Bundu-Liwan.

Bluster says members of the club were driven by the motto ‘If not us, then who else?’ – in their efforts to preserve dialects. — Photos: BLUSTER JAINONBluster says members of the club were driven by the motto ‘If not us, then who else?’ – in their efforts to preserve dialects. — Photos: BLUSTER JAINON

After inaugurating the club with WCUGM, Bluster says the motto changed to ‘Boros nopo nga guas toilaan (Language is the root of knowledge)’ to reflect the club’s aim to spread knowledge through sharing about languages on Wikipedia.

WCUGM Project Coordinator Farouk Azim Abd Rahman proudly reveals that the Wikipedia Kadazandusun page made history by being the first to publish several Wiki articles.

“For example, an article on a movie called Sinakagon started out in Wikipedia Kadazandusun first. It was regarded as the first movie truly in Kadazandusun,” he says.

In Kota Belud, Sabah, Bajau-sama Wikipedia project lead Mohd Syafiq Yahya says he started uploading content to Wikikamus and Wikicommons on anything related to Bajau-sama culture, such as the meaning of words, various types of food, and traditional clothing during the pandemic, to preserve the language.

Mohd Syafiq believes the Wikimedia project is a good approach to preserving the language and has started uploading Bajau-Sama cultural content to Wikikamus and Wikicommons. — MOHD SYAFIQ YAHYAMohd Syafiq believes the Wikimedia project is a good approach to preserving the language and has started uploading Bajau-Sama cultural content to Wikikamus and Wikicommons. — MOHD SYAFIQ YAHYA

“Based on how I see our lives now, I fear that our culture is being forgotten by the youth. By doing this project for Wikimedia, perhaps this is the only possible way for the language to survive,” he says.

After WCUGM learned about the possibility of the Mendriq language facing extinction, the group initiated an engagement project with the Orang Asli community in Gua Musang, Kelantan.

Taufik says they organised a two-day event last year to show community members how to add Mendriq words to Wiktionary for translating from Malay.

Taufik proudly announced that on May 28, Wikipedia Kadazandusun graduated from the Wikimedia Incubator with more than 900 articles. — DON WONG/Wikimedia FoundationTaufik proudly announced that on May 28, Wikipedia Kadazandusun graduated from the Wikimedia Incubator with more than 900 articles. — DON WONG/Wikimedia Foundation

“Members of the community managed to add more than 100 words for Mendriq. Before this, there were no such resources for the dialect online. Now it’s freely available for anyone to use,” he says.

Similar efforts were made for the Kensiu language spoken by the Negrito tribe, a community of about 300 people in Kampung Lubuk Legong, Kedah.

“There is an assumption that people in rural areas are not interested in technology, but you will be surprised as they all have their own smartphones,” Taufik says.

Meeting the mark

Farouk Azim explains that several important requirements had to be met before a new language could be approved for launch as an official Wikipedia page.

“They have targets to ensure the communities are focused on continuously improving the website, adding more articles, and maybe even growing the number of contributors,” he says.

Farouk aims to inspire more Malaysians to volunteer and contribute to broadening digital knowledge about their communities. — DON WONG/Wikimedia FoundationFarouk aims to inspire more Malaysians to volunteer and contribute to broadening digital knowledge about their communities. — DON WONG/Wikimedia Foundation

The requirements include three months of regular editing, with each member making about 11 edits per month. Farouk Azim adds that the articles must also be high quality, with citations or references to reliable sources.

“They also need to be able to contact a language expert to verify the entries to ensure that the texts are based on actual terms spoken in the language and not just gibberish,” he says.

He adds that there are linguists and community experts that members often contact to verify their entries.

“Translating an existing Wikipedia article is not just limited to the words on the subject, but they also have to ensure that technical terms on the website such as ‘login’ are accurate as well,” he says.

When WCUGM sets out for community engagement, Farouk Azim says they have to make sure that the community involved has access to the Internet with their own devices.

“We have to research whether they have Internet capabilities because, for now, we do not have capabilities to aid communities that have no Internet at all,” he says, adding that there are plans to seek help from other organisations.

Farouk Azim announces that soon, more Wikipedia pages in local languages will be launched, including Bajau-sama (spoken by ethnic groups in Sabah), Iban (the largest ethnic group in Sarawak with various dialects), and Semai (used by the Orang Asli communities in Pahang and Perak).

He is also hopeful that one day the content of ethnic local languages in Wikipedia will be enough to train artificial intelligence (AI) models.“Of course, our contributors can’t use AI tools for the translation work simply because there’s not enough data to train the existing models.

“This is why we’re committed to bringing more of these languages to Wikipedia, with the goal of eventually gathering enough data to train AI tools. We view our efforts as the beginning of a much larger journey, rather than the end,” he says.

Follow us on our official WhatsApp channel for breaking news alerts and key updates!
   

Next In Focus

The Mufti Bill, as explained by legal experts
Melania Trump defends nude modelling work
Philandering CEOs are finally getting fired in the United States
Tone deaf chaos?
China Daily: Asean should beware US hijacking its agenda
Two-state solution: A practical peace
‘Impact: zero’
US vs China in South-East Asia
Redefining cool K-careers
A better budget for healthcare: The case for gender-responsive approaches

Others Also Read