I would like to share some facts with the public about the limitations and problems with the government's HIDE system which was announced recently.
First of all, MySejahtera is an application that can only collect the check-in time, location, and number of people at the location. I'm not sure if GPS data is collected, but even so, GPS is unreliable indoors in shopping malls with hundreds of shops.
The risk factors for spreading Covid-19 are: the time spent with the index case, distance from the index case, mask usage, air quality, infectivity of individual virus strains, and many more.
None of this information is recorded with MySejahtera. These elements are all significant statistical compounding factors.
A statistically sound predictive model must include as many known risk factors as possible into the data analysis. The data must also be validated thoroughly and free from statistical biases.
At the moment, it seems that shopping malls have the highest risks of Covid-19, but this is likely a sampling error as malls have a high compliance of using MySejahtera. Workplaces, homes, hospitals and prisons will 'appear' to have lower risks, but that is simply due to the fact that MySejahtera usage is lower.
Predictive modelling is not new and has been used for decades in the field of medicine. Even the best predictive systems have intrinsic flaws. The predictive score cannot be the sole factor in making a medical decision.
For example, if the system was to predict that a sick patient had a 70% chance of dying after a surgery, is it ethical for a doctor to refuse to operate on the patient? What if the surgery can give him an extra 10 years of life? Would it be fair to refuse him a 30% chance of long-term survival?
Even if we assume that we have perfect data collected, apply this to the case of a supermarket. Supermarkets are essential services and a sudden closure of stores may have an unpredictable side effect of people flocking to other shops or markets to buy groceries. An infected individual can also cause an outbreak in the pasar malam.
So did the new insight from the predictive data actually help to reduce the infection rates?
There is also the issue of data confidentiality. By releasing the names of individual businesses without their consent, has the Personal Data Protection Act (PDPA) been violated? In the state of Emergency, the government can legally do so, but is this fair and ethical to the affected premises?
Big data is a powerful tool that can be wielded by the government to make better decisions. The HIDE system, using data from MySejahtera, is a very flawed application of otherwise good data. The best data is only as good as the people analysing and interpreting it.
A little knowledge is a dangerous thing.