- +91 (0)80 2258 0263
Machine learning (ML) has emerged as a popular and effective AI learning technique. This isn’t surprising, given its remarkable ability to sift through humongous volumes of data and find patterns that are simply not obvious to humans. What is even more remarkable is its ability to learn from data over time and to deliver extremely accurate insights and predictions.
Of course, no matter how great your algorithm is, the quality of insights eventually depends on the quality of data that is being fed into the algorithm. If the data is not accurate, it can prove to be an Achilles heel that can result in tricky errors. One great example is Amazon’s ambitious project to mechanize the process of searching for top talent for the organization. The company had built computer programs to review profiles of job applicants and rate them on a scale of one to five. Surprisingly, the company found that the tool seemed to be biased against female candidates while evaluating them for technical job profiles. Investigation showed that the primary reason for this bias was the underlying data on which the tool was basing its analysis. The model was designed to evaluate candidates based on patterns over the preceding decade, when the industry was exceedingly male-dominated. Based on this data, the tool taught itself to view male candidates more favorably compared to their female counterparts. As a result, it penalized words such as ‘women’ and also downgraded candidates who had graduated from all-women colleges.
Given the number of biases that exist in society based on factors such as gender, race, nationality among others, these are likely to be reinforced by machine learning algorithms. Since machine learning models base their decisions solely on the data available, they are likely to have several blind spots that can lead to numerous glitches.l
Therefore, it is important to evaluate data and check for any inconsistencies and test for sample biases and edge cases. While there might not be one single formula to eliminate a bias, here are certain approaches that can help:
There are several AI models that are designed to solve different problems. By understanding the exact learning process, it is possible to ensure that the underlying data is not throwing up any inadvertent biases. Human decision-making often involves the consideration of certain vulnerabilities that could be based on qualitative parameters that cannot be easily incorporated in the data set. Accounting for these in the ML models is an important first step.
Data selection often boils down to choosing the right data sets that are truly representative but also unbiased. While this is not an easy balance to maintain, it is necessary in order to eliminate biases like the one in the Amazon example above. You need to be especially careful if you are segmenting the data for research purposes. Given the wide variety of parameters that the machine learning algorithm may consider, ensuring that your segmentation does not introduce any unintended biases is crucial. The machine learning algorithm might not understand that correlation does not necessarily imply causation. For example, if the data set is too small, it might give rise to background noise data getting picked up and magnified, thus introducing an error in the findings.
An important question is: how does one monitor the quality of the output to ensure that it is not discriminatory in nature? One approach is to actively check for biases that might appear. For instance, if you find that people of a certain nationality or community seem to be advancing up the corporate ladder more slowly than other groups, despite all other parameters being seemingly equal, it might be worthwhile to examine if there is indeed any bias at play. This is especially important if you are relying on a machine learning algorithm to review employee performance. Monitoring performance using real world data is thus very necessary.
While most machine learning biases emerge unintentionally rather than due to malice, there are several measures available for organizations to minimize or eliminate these biases. As AI and machine learning becomes increasingly popular, there is likely to be a slew of regulations and compliance guidelines from governments. These will force organizations to take corrective measures to ensure that biases don’t creep through your ML-based decisions.
For all we know, the prospect of legal action might prove to be the principal driver for proactive action in curbing machine learning biases. However, the best way for responsible organizations would be to address these biases on their own, and the best time to do that would be now.
Lymbyc, being the world's first virtual analyst has proven its mettle in the industry . Be it the "Most innovative Data science Product" by Aegis or "the top 10 emerging Analytics startups in India to watch out for in 2018" by Analytics India Magazine, Lymbyc is making heads turns and making headlines