As the anti-racism protests continued across the US and people are turning their Instagram posts pitch black today to stand in solidarity with those who are fighting against racial injustice. It’s a strong reminder that no one should take racial harmony for granted and diversity should always be a top priority.
Last April, the newly established Institute for Human-Centered Artificial Intelligence (HAI) at Stanford University was being questioned for its lack of Black faculty member even though the institute asserted “the creators and designers of AI must be broadly representative of humanity”. AIMed also reported earlier, we have probably witnessed the possible consequences of biasedly designed AI algorithms. For examples, software which gives a higher score for Black individuals on the likelihood to commit future crimes. An AI judge which has a tendency to regard White contestants as prettier and a mortgage algorithm with exhibits racial discriminations.
Meredith Broussard, data journalist and Assistant Professor at New York University said in her book “Artificial Unintelligence” that “algorithms are designed by people and we are biased”. Broussard calls it technochauvinism: overpraise technology at the price of human judgement. So, how bad will AI preform when some data is deliberately removed?
AI will lose its predictive capability
A new study led by a group of researchers in Argentina found that when female samples were deliberately removed or undermined considerably in the training data used to develop an AI system, the algorithm will no longer be able to accurately predict or diagnose the medical conditions it was intended for. The same result was found when men samples were exclusively eliminated or underrepresented. The findings were published last week on the Proceedings of the National Academy of Sciences of the United States of America (PNAS).
Three open-source machine learning models that analyzes chest X-ray images to uncover the presence or absence of 14 medical conditions ranging from pneumonia, hernias to enlarged heart were examined. These models – DenseNet-121; ResNet, and Inception-v3 are widely used by the research community but have yet to be deployed clinically. Researchers used them to train two datasets maintained by the National Institutes of Health (NIH) and Stanford University.
Each of these datasets comes with tens of thousands of chest X-ray images with male and female patients represented rather equally. To illustrate the significance of bias in AI, researchers skewed a subset of the chest x-ray images. They either trained the models with 100% female or male patients’ images; 75% female and 25% male patients’ images, 25% female and 75% male patients’ images, or a 50/50 split.
Cautions on medical conditions that only affect one gender
By keeping sample size the same for each set of training data, researchers tested each model on images either from male or female patients. Regardless of medical conditions, the model is not able to perform when tested in patients whose gender was not or underrepresented in the training. Similarly, overrepresenting a particular gender in training also affects the algorithms’ performance.
Nevertheless, researchers did not answer why algorithm trained only on male or majority of male patients’ data, performed the worst when they are tested on women. Some suggested the possibility of physiological differences or it could be images from female patients tend to be taken earlier or later in the disease progression.
With that, in conditions that affect mostly a particular gender, such as autism, whereby boys have a higher chance of being diagnosed as compared to girls, or breast cancer in which the data are nearly all female, researchers may have to be extra careful, because their algorithms cannot be help but to be trained with a more skewed data. Despite so, some developers thought as long as they have such awareness and are able to communicate that to fellow clinicians, representativeness is excusable under the extreme circumstances.