Faces for cookware: data collection industry flourishes as China pursues AI ambitions


  • TECH
  • Friday, 28 Jun 2019

Jia Yahui, a 29-year-old employee labels vehicles on an image on a computer screen, which would serve for developing artificial intelligence (AI) and machine learning technology, at the Qian Ji Data Co in Jia county, Henan province, China March 20, 2019. Picture taken March 20, 2019. REUTERS/Irene Wang NO RESALES. NO ARCHIVE.

PINGDINGSHAN, China: In a village in central China's Henan province, amid barking dogs and wandering chickens, villagers gather along a dirt road to trade images of their faces for kettles, pots and tea cups.

At the front of the line, a woman stands in front of a camera zip-tied to a tripod. She holds a photograph of her head with the eyes and the nose cut out in front of her face and slowly rotates side to side.

Villagers waiting their turn take a numbered ticket. Some of them say it's the third or fourth time they've come to do this sort of work.

The project, run out of a sleepy courtyard village house adorned with posters of former China leader Mao Zedong, is collecting material that could train AI software to distinguish between real facial features and still images.

Staff members from Qian Ji Data Co take photos of the villagers for a facial data collection project, which would serve for developing artificial intelligence (AI) and machine learning technology, in Jia county, Henan province, China March 20, 2019. Picture taken March 20, 2019. REUTERS/Cate Cadell
Staff members from Qian Ji Data Co taking photos of the villagers for a facial data collection project, which would serve for developing artificial intelligence (AI) and machine learning technology, in Jia county, Henan province, China.

"The largest projects have tens of thousands of people, all of whom live in this area," said Liu Yangfeng, CEO at Qianji Data Co Ltd, which collects and labels data for several of China's largest tech firms and is based in the nearby city of Pingdingshan.

"We are creating more data sets to serve more AI algorithm companies, so they can serve the development of artificial intelligence in China," said Liu, declining to disclose his clients.

The boom in demand for data to train AI algorithms is feeding a new global industry that gathers information such as photos and videos, which are then labelled to tell the machines what they are seeing.

Companies involved in data labelling or data annotation as it is also called include crowdsourcing platforms such as Amazon.com's Mechanical Turk which offer users small amounts of money in return for simple tasks, outsourcing firms such as India's Wipro Ltd as well as professional labellers like Qianji.

Liu Yangfeng, CEO of the Qian Ji Data Co poses at his company in Jia county, Henan province, China March 20, 2019. Picture taken March 20, 2019. REUTERS/Irene Wang NO RESALES. NO ARCHIVE. u000d u000d
"We are creating more data sets to serve more AI algorithm companies, so they can serve the development of artificial intelligence in China," said Liu, declining to disclose his clients.

Cognilytica, a US research firm specialising in AI, estimates the global market for machine-learning related data annotation grew 66% to US$500mil (RM2.07bil) in 2018 and is set to more than double by 2023. Some industry insiders say, however, that much of the work done is not disclosed, making accurate estimates difficult.

Weak privacy laws, cheap labour

China has emerged as a key hub for data collection and labelling thanks to insatiable demand from a burgeoning artificial intelligence sector backed by the ruling Communist Party, which sees AI as an engine of economic growth and a tool for social control.

A plethora of firms have invested heavily in an area of AI known as machine learning, which is at the core of facial recognition technology and other systems based on finding patterns in data.

These include tech giants Alibaba Group Holding Ltd, Tencent Holding Ltd, Baidu Inc as well as younger companies such as AI specialist SenseTime Group Ltd and speech recognition firm Iflytek Co Ltd.

The result has been a proliferation of AI products and services in China, from facial recognition-based payment systems to automated surveillance and even AI-animated state media news anchors. Chinese consumers mostly see these technologies as novel and futuristic, despite concerns raised by some over more invasive applications.

Weak data privacy laws and cheap labour have also been a competitive advantage for China as it races to become a global leader in AI. The Henan villagers were happy to trade several sessions in front of a camera for a tea cup, or several hours for a stove-top pot.

Overseas customers

Beijing-based BasicFinder, a leading data labelling firm with locations across Hebei, Shandong and Shanxi provinces, boasts a robust mix of domestic and overseas clients.

At a recent visit to its Beijing offices, some staff were labelling images of sleepy people that will be used by an autonomous driving project to identify drivers who might be falling asleep at the wheel.

Others were labelling British documents from the 1800s for a Western online ancestry service, marking fields for dates, names and genders on birth and death certificates.

According to BasicFinder chief executive Du Lin, hiring trained labellers in China is cheaper than using Western crowdsourcing marketplaces.

Employees work on labeling different items for data collection on computer screens, which would serve for developing artificial intelligence (AI) and machine learning technology, at the Qian Ji Data Co in Jia county, Henan province, China March 20, 2019. Picture taken March 20, 2019. REUTERS/Irene Wang NO RESALES. NO ARCHIVE.
Employees work on labeling different items for data collection on computer screens, which would serve for developing artificial intelligence (AI) and machine learning technology, at the Qian Ji Data Co.

A Princeton University project related to autonomous driving initially put a task on Amazon's Mechanical Turk but as the task became more complicated, people began making mistakes and BasicFinder was brought in to help correct the results, said Du.

In that project, one trained BasicFinder labeller was able to do the work of three crowdsourced labellers, he added.

"Gradually they saw they were paying less for labelling from us, so they hired us to label all the works from the very beginning," said Du.

Princeton declined to comment.

For labelling employees, the reasons for joining China's data industry are straightforward. The work, though sometimes tedious, is an upgrade on other jobs available to young workers who want to return home to small Chinese cities and villages.

Labellers at Qianji make roughly 100 yuan (RM60.26) a day marking data points on photographs of people, surveillance footage and street images.

The work is usually simple, according to the employees, though some overseas content poses a challenge.

"One time we thought we were classifying Europe-style cooker machines that have a washer attached," said Jia Yahui, a labeller at Qianji. "Later we were told it's actually two separate things, a stove and a dishwasher."

The labelling work brings some of the employment benefits of the tech sector to rural areas, but those benefits may prove short-lived if AI improves enough to perform many of the tasks labellers do.

"We think this industry will still exist in three to five years. It may not be a long-term career – we can only think of the five-year plan for now," said Qianji CEO Liu. – Reuters

Article type: metered
User Type: anonymous web
User Status:
Campaign ID: 1
Cxense type: free
User access status: 3
   

Next In Tech News

TikTok founder’s US$60bil fortune places him among the world’s richest people
Civil groups want EU lawmakers to boost privacy rights in planned WhatsApp, Skype rules
Remember the ‘Leave Britney alone’ video? Its creator sold it as an NFT for over US$44,000
WhatsApp releases new sticker pack for Ramadan
Hillbilly Elegy’s Vance leaves board after controversial tweets
Europeans want digital euro to be private, safe and cheap: ECB survey
Coinbase listing marks latest step in crypto's march to the mainstream
Buying diamonds in lockdown? WhatsApp can be your best friend
Digi offers 100GB high speed data pass at RM7, available from 3am to 7am
Maybank warns of new fake banking website created to steal customer details

Stories You'll Enjoy


Vouchers