At this moment in history it’s impossible not to see the problems that arise from humanbias. Now magnify that by compute and you start to get a sense for just how dangerous humanbiasvia machine learning can be. The damage can be twofold:
- Influence. If theAIsaid so it must be true… people trust outputs ofAI,so if humanbiasis missed in the training it could compound the problem by infecting more people;
- Automation. SometimesAImodels are plugged into a programmatic function, which could lead to the automation ofbias.
But there is potentially a silver machine-learned lining. BecauseAIcan help expose truth inside messy data sets, it’s possible for algorithms to help us better understandbiaswe haven’t already isolated, and spot ethically questionable ripples in human data so we can check ourselves. Exposing human data to algorithms exposesbias,and if we are considering the outputs rationally, we can use machine learning’s aptitude for spotting anomalies.
But the machines can’t do it on their own. Even unsupervised learning is semi-supervised, as it requires data scientists to choose the training data that goes into the models. If a human is the chooser,biascan be present. How the heck do we tackle such abiasbeast? We will attempt to pick it apart.
The landscape of ethical concerns withAI
Bad examples abound. Consider thefindingfrom Carnegie Mellon that showed that women were shown significantly fewer online ads for high-paying jobs than men were. Orrecall the sad case of Tay, Microsoft’s teen slang Twitter bot that had to be taken down after producing racist posts.
In the near future, such mistakes could result in hefty fines or compliance investigation, a conversation that’s alreadyoccurring in the U.K. parliament. All mathematicians and machine learning engineers should considerbiasto some degree, but that degree varies from instance to instance. A small company with limited resources will often be forgiven for accidentalbiasas long as the algorithmic vulnerability is fixed quickly; a Fortune 500 company, which presumably has the resources to ensure an unbiased algorithm, will be held to a tighter standard.
Of course, an algorithm that recommends novelty T-shirts does not need nearly as much oversight as an algorithm that decides what dose of radiation to give to a cancer patient. It’s these high-stakes decisions that will become the most pronounced when legal liability enters the discussion.
It’s important for builders and business leaders to establish a process for monitoring the ethical behavior of theirAIsystems.
Three keys to managing bias when building AI
There are signs of existing self-correction in theAIindustry: Researchers arelooking at waysto reducebiasand strengthen ethics in rule-based artificial systems by taking human biases into account, for example.
These are good practices to follow; it’s important to be thinking proactively about ethics regardless of the regulatory environment. Let’s take a look at several points to keep in mind as you work on yourAI.
1. Choose the right learning model for the problem.
There’s a reason allAImodels are unique: Each problem requires a different solution and provides varying data resources. There’s no single model to follow that will avoidbias, but there are parameters that can inform your team as it’s building.
For example, supervised and unsupervised learning models have their respective pros and cons. Unsupervised models that cluster or do dimensional reduction can learnbiasfrom their data set. If belonging to group A highly correlates to behavior B, the model can mix up the two. And while supervised models allow for more control overbiasin data selection, that control can introduce humanbiasinto the process.
It’s better to find and fix vulnerabilities now than to have regulators find them later on.
Non-biasthrough ignorance — excluding sensitive information from the model — may seem like a workable solution, but it still has vulnerabilities. In college admissions, sorting applicants by ACT scores is standard, but taking their ZIP code into account might seem discriminatory. But because test scores might be affected by the preparatory resources in a given area, including the ZIP code in the model could actually decreasebias.
You have to require your data scientists to identify the best model for a given situation. Sit down and talk them through the different strategies they can take when building a model. Troubleshoot ideas before committing to them. It’s better to find and fix vulnerabilities now — even if it means taking longer — than to have regulators find them later on.
2. Choose a representative training data set.
Your data scientists may do much of the leg work, but it’s up to everyone participating in anAIproject to actively guard againstbiasin data selection. There’s a fine line you have to walk. Making sure the training data is diverse and includes different groups is essential, but segmentation in the model can be problematic unless the real data is similarly segmented.
It’s inadvisable — both computationally and in terms of public relations — to have different models for different groups. When there is insufficient data for one group, you could possibly use weighting to increase its importance in training, but this should be done with extreme caution. It can lead to unexpected new biases.
For example, if you have only 40 people from Cincinnati in a data set and you try to force the model to consider their trends, you might need to use a large weight multiplier. Your model would then have a higher risk of picking up on random noise as trends — you could end up with results like “people named Brian have criminal histories.” This is why you need to be careful with weights, especially large ones.
3. Monitor performance using real data.
No company is knowingly creating biasedAI, of course — all these discriminatory models probably worked as expected in controlled environments. Unfortunately, regulators (and the public) don’t typically take best intentions into account when assigning liability for ethical violations. That’s why you should be simulating real-world applications as much as possible when building algorithms.
It’s unwise, for example, to use test groups on algorithms already in production. Instead, run your statistical methods against real data whenever possible. Ask the data team to check simple test questions like “Do tall people default onAI-approved loans more than short people?” If they do, determine why.
When you’re examining data, you could be looking fortwo types of equality: equality of outcome and equality of opportunity. If you’re working onAIfor approving loans, result equality would mean that people from all cities get loans at the same rates; opportunity equality would mean that people who would have returned the loan if given the chance are given the same rates regardless of city. Without the latter, the former could still hide if one city has a culture that makes defaulting on loans common.
Result equality is easier to prove, but it also means you’ll knowingly accept potentially skewed data. While it’s harder to prove opportunity equality, it is at least valid morally. It’s often practically impossible to ensure both types of equality, but oversight and real-world testing of your models should give you the best shot.
Eventually, these ethicalAIprinciples will be enforced by legal penalties. IfNew York City’s early attemptsat regulating algorithms are any indication, those laws will likely involve government access to the development process, as well as stringent monitoring of the real-world consequences ofAI. The good news is that by using proper modeling principles,biascan be greatly reduced or eliminated, and those working onAIcan help expose accepted biases, create a more ethical understanding of tricky problems and stay on the right side of the law — whatever it ends up being.
Source: Artificial Intelligence