In the first part of this blog, we looked at the issue of lack of transparency in how decisions were made by AI in important areas affecting people’s lives. We considered the area of job selection for instance and asked a series of questions ending with a simple, “why wasn’t I chosen”? What did AI “look at” that caused it not to select me in comparison to others in the candidate pool? These are important questions but not easy to answer, or even impossible to answer, without knowing what factors and weighting the AI models applied to each candidate. In this second part, we’ll look at one methodology Vettd has designed for approaching the transparency issue.
Transparency in data privacy involves openness; being willing to share with users all aspects of usages of their personal data. This includes an openness on what is being collected, why it is being collected, how it is being analyzed, and how the decisions being made by AI algorithms were decided (what output parameters drove the decisions made by the AI algorithms).
At Vettd, we looked at this issue and designed an approach that provides transparency throughout so that one could answer the questions of how the AI models made the selections at any point throughout the process. Like “privacy by design,” we approached the issue with the intention of designing transparency into the process at every step. In essence, the “black box” was removed so that one could look at the factors and weightings at any given point and make them explainable.
Take an example of a service platform that contextually analyzes data and uses AI to create neural networks for processing and providing decisions to prioritize candidates for a particular job opportunity. Instead of the analysis being done in a black box or opaquely, the choices and the associated parameters of algorithms, learned models, and training and decision support algorithms being used are exposed at every key step so that the prioritization outcomes are reproducible and explainable.
To prioritize candidates, talent data needs to be analyzed and ranked. The talent data could be, for example, hundreds or thousands of resumes of candidates for a job. The objective is to use AI to go through the talent data and identify and rank the highest potential candidates for consideration for a position(s). To provide transparency, we look at the process in three sections: the talent data, the algorithms, and the output or decisions.
The main component of the talent data section is using contextual algorithms to provide preprocessing of the data with the main purposes of removing bias "influencers" and normalizing natural language between documents. The objective of removing bias influencers is to reduce or eliminate language that could unduly influence the algorithms in areas of race, gender, etc. so that the algorithms are not making decisions that are influenced based on these factors. Bias influencers removed could include name, associations, organization, sports participation, or even education institutions. Part of the preprocessing can include normalizing natural language between documents. The contextual algorithms look at all of the data from all the input documents to look for differences in the language that actually are the same or have the same meaning. The idea is to change the descriptions, so they essentially state the same thing. A simple example is where one person might list they have a JD, while another lists that they have a juris doctorate, and another lists that they received a law degree. Similarly, gender bias can be addressed. Where men and women write resumes from different vantage points, these differences can be normalized to remove the weight differences data in the algorithm section. This ensures that each person receives an equal weighting by the algorithms that might impact the analysis and provide an advantage to one gender over the other.
The final preprocessing step is to identify the most relevant contextual language in the talent data documents. The contextual AI models analyze the data and identify which language components are most relevant to move forward to the analysis process. Critical to the transparency process, all of the preprocessing steps are visible so that the user could request to see what language factors were chosen as being relevant, what bias-influencers were removed, and what normalization took place during this input process. All of these may be made visible to the user through a variety of means such as reports or via online interactions.
The algorithms control the processed data from the talent data section through the neural networks that make up the AI models to the output. The architecture must be defined to create statistical machine learning algorithms based on artificial neural networks. The weights of the neural network are determined during a training phase but are not static and can be modified during the execution phase.
Underlying all of the processes is a metadata management system that provides for transparency into all facets of the architecture. Metadata is used to track the state of every key element so that the exact algorithms, weighting factors, and data can all be recreated as needed. The neural networks are time-stamped for traceability and accountability. Even though the neural networks may evolve over time, the metadata and timestamps allow the user to see the neural networks that were run against the data at a given point in time. While in most AI platforms, the neural networks or algorithms may be completely opaque, here, the user has trace back from the decisions to the algorithms that drove the decisions at a given time in the past. Rather than an AI model in a black box, there is a transparent view of the training data and algorithms used in any given AI model that can be shown to the user through a variety of means such as visualization reports or via online interactions.
In this example, the output is a list of individuals with the factors that were relevant and with rankings assigned to the contextual factors by the algorithms to generate a particular relevance score. The entire listing is completely transparent. The user has the ability to request to see all the talent data output and can see the contextual choices that were made that drove the comparative rankings. This may be shown to the user through a variety of means such as reports or via online interactions. For privacy purposes, data of other users may not be shown or may be obfuscated to protect other user's identities and personal information. Based on the outcomes, adjustments can be made in the preprocessing section itself to adjust for biases, or to provide further impact analysis, or to set up "what if' scenarios to further analyze outcomes.
Instead of a black box with inputs and outputs and no understanding of how a decision was made, the basis behind the AI at each point in the decision-making process can be shown. The platform captures the neural network model and architecture that was used in each analysis, so users can see how ongoing neural network development is affecting decisions that impact them. The audit trail can be analyzed for biases or to adjust the neural network as needed with new feedback or additional factors. At all stages, the user can be provided with reasoning behind the choices made by AI. The end result is a transparent outcome, allowing all steps to be visible to users and providing accountability that the openness of the AI processes provides explainability for the decisions made.
With the black box removed, transparency can provide the user with the rationale used behind the decisions. One may still debate whether the right factors or weighting were used by the AI models in a particular selection process. But, isn’t this where the debate should be?
Vettd filed a patent on this approach earlier this year and we remain committed to designing, developing and enhancing AI models that promote transparency.