Explainable Artificial Intelligence (XAI): Overcoming the Challenges with evoML

“Investor Sues After an AI’s Automated Trades Cost Him $20 Million.”

The headline that shook the financial sector in 2017.

A trading activity between Tyndaris Investments (a company that launched a robot hedge fund) and a Hong Kong real estate investor led to an outburst of controversy and legislation. Worrying questions started to rise:

  1. What decision did the AI make and why did it make that decision?
  2. Who’s responsible for the problem that arose?
  3. Was the AI not aware of the possible issues? If not, why?
  4. Did the AI make the same mistake as human investors? If not, what were the reasons e.g., gender, race etc.?

It had only been a year before that AI was creeping into the world of finance – from helping with fast market data analysis in real-time to autonomously executing efficient trades.

Now, with these questions (and more) lurking at the edge of nearly every industry dabbling in technology, from transportation to healthcare to law enforcement, it is certainly affecting the way in which companies are trying to embed AI into their products.

In this blog, we will give an overview of explainable AI (XAI) and focus on how TurinTech’s evoML platform can be used to overcome the challenges that XAI poses.

Model explainability

XAI is a set of processes and methods that allows humans to comprehend and trust the results and output created by machine learning algorithms. There are different types of taxonomy with machine learning models but one which helps us understand XAI is the infamous White Box VS Black Box model.

  • White-box models such as linear regression, logistic regression and decision trees provide less predictive capacity compared to black-box models. However, they are significantly easier to explain and interpret.
  • Black-box models such as neural networks, gradient boosting models or complicated ensembles often provide great objective calculations such as accuracy, F1 etc., but the inner workings of these models are exceptionally difficult to understand due to their complex architecture.

Figure 1. evoML tackles the trade-off between model complexity and explainability

When it comes to understanding AI decisions, three keywords repeatedly crop-up: Explainability, Interpretability and Causality. Explainability and Interpretability are used interchangeably (with a subtle difference) – Explainability focuses more on “What does the model tell me?” whereas Interpretability focuses more on “How does the model work?”. Causality is separate from the other two but interlinked – it focuses more on “What caused my model to behave like this?”.

Pre-modelling explainability

evoML provides an easy and scalable way to create and deploy optimised ML pipelines to automate the whole process of data science via evolutionary learning. To demonstrate how evoML handles XAI, we will conduct our research on financial market data. The aim is to predict whether each example has a higher adjusted close price than the previous i.e., a binary classification time series task, focussing on signal prediction. If the target label is 1, then the example has a higher adjusted close price than the previous example and if 0 then it does not.

Figure 2. Original financial market dataset.

At the start of the ML process, evoML applies statistical analysis to the dataset via an interactive interface, which comprises of:

  1. Distribution analysis of columns and rows: This helps detect skewness and anomalies to combat potential data leakage (that could lead to incorrect model evaluation) and bias issues (that could lead to discrimination against certain groups of training examples).
  2. Data imputation: This helps detect missing data to combat issues of model compatibility with the dataset (that could lead to failed deployed models).

This process helps users understand the dataset that will be used to develop models.

Figure 3. Distribution and statistics of Open feature.

Figure 4. Distribution and statistics of Target feature. Tag suggests a balanced dataset.

Modelling explainability

Once the data has been fully preprocessed, we are ready to fit the machine learning model to the dataset, which in this case is a Logistic Regressor. After the model is tuned and fitted to the preprocessed data, we evaluate the model with various metrics and produce visualisations for model performance (such as confusion matrices, precision-recall curves etc.).

At this point, users may feel as if the machine learning pipeline is complete i.e., if they obtained a good accuracy or F1 score, then the model is ready to be deployed. However, a key oversight is skipping model explainability. This is because:

  • We may want to know, for a specific machine learning algorithm, in what way the features contribute to the predictions i.e., independently or interactively.
  • We may want to know, for a specific machine learning algorithm, how changes to the dataset affect the prediction quality.

These are only a few questions that practitioners may ask and may be deemed most important as these questions are what drive business decision-making since model performance alone cannot help with this. Thus, we must have a way of providing reliable and trustworthy explanations for these questions. Below, we outline some popular XAI visualisations for this dataset and try to provide clear explanations for what is seen.

Global Explainability:

In this section, we concentrate on dataset-level explainers, which help us understand how the model predictions perform overall, for an entire set of observations.

General Feature Importance:

General feature importance is a model-specific explainability method, which allows users to understand how the chosen model interprets the impact/importance that features have on predicting a target variable. For linear-based models, such as linear regression, logistic regression or support vector machines, the feature importance values returned represent the weights/coefficients of the model (which can be positive or negative). One should test for statistical significance via p-values to determine whether the feature has a true effect on the target or not. For tree-based models, such as decision trees or random forests, the feature importance values are not based on weights/coefficients. Instead, feature importance is based on the reduction in the criteria used to select split points e.g. Gini or Entropy (which is always positive). In either case, the magnitude and sign indicate their individual importance in predicting the target.

Figure 5. Weights/Co-efficients values for Logistic Regression model.

For logistic regression, the interpretation of the coefficient value is the following:

“Increasing the predictor by 1 unit (or going from 1 level to the next), multiplies the odds of having the outcome by .

From the top 10 important features, we see that the date is a weekend feature has a larger negative coefficient value (roughly ). As a result, the interpretation becomes:

“Date_is_weekend is associated with a 95% (1 – 0.05 = 0.95) reduction in the target prediction”.

Partial Dependence Plots:

Partial dependence plots is a model-specific explainability method that is based on the ceteris paribus principle. “Ceteris paribus” is a Latin phrase meaning “other things held constant” or “all else unchanged”. This method examines the influence of a feature by assuming that the values of all other variables stay fixed whilst the feature to analyse varies through different possible values. As a result, users can see how the changes in the values of a particular feature affect the original predicted values (on average). Given that the weekend feature seems to be the top predictor for our target, let’s see what the partial dependence plot looks like for this.

Figure 6. Prediction certainty for the feature “date_is_weekend” over different values.

We can clearly see here that as we transition from a weekday to a weekend, the predicted probability falls i.e., we are more certain to predict 1 if the day is a weekday.

For plotting purposes, between the value 0 and 1, the line has interpolated (since we cannot have a fractional indicator for knowing if it is a weekend or not as it is a discrete variable).

Local Explainability:

In this section, we concentrate on instance-level explainers, which help us understand how a model yields a prediction for a particular single observation. Some clients might be more concerned with this than global since they may want to target specific consumers. For the following methods, we will consider the third row in the training example – this example is predicted 0 i.e., it has a lower adjusted close price than the example before it.


SHAP is based on the concepts of coalitional game theory and can be used to explain the predictions of any machine learning model by calculating the contribution of each feature to the prediction via Shapley values. A question one can pose here is:

“If a single feature is changed by a single unit, for a particular example, how much would it impact the example’s prediction (based on the average prediction)?”.

Since the general feature importance indicates that the most important feature for target prediction is knowing if it is a weekend or not, let us see if this is reflected in our chosen example.

Figure 7. Feature contributions for training example ‘3’.

We can see that the feature of knowing if it is a weekend or weekday has the greatest push in the model output from the base value (the average model output over the training dataset we passed) to the model output, which agrees with the general feature importance. Also, we confirm that the prediction for this example (0), corresponds to (negative value indicates 0 as this is a classification task), which is the combined feature contributions.

Counterfactual Explanations:

A counterfactual explanation of a prediction describes the smallest required change to the feature values that would change the prediction to a different/opposite value. This has deep connections with Causality AI (an entire division of its’ own which we will discuss in another article) and hence has a vast amount of research conducted around it.

evoML uses a constrained evolutionary algorithm to carefully select feasible counterfactual explanations, against some pre-defined objective criteria. This is beneficial as users can apply specific constraints to their features, typically mimicking situations such as for portfolio optimisation. A question that one can pose here is:

“What should the stock market feature values be for the observation in order to have signalled a higher close price than the previous observation?”.

Figure 8. Top 10 counterfactual explanations (displaying the first 5 preprocessed features, including the most important feature from general feature importance).

It is evident from the 5 features, that the top 10 counterfactual explanations all suggest if the date is not a weekend i.e., it is a weekday, then our observation would have a higher adjusted close price than the observation before (with interaction from the other changed feature values). Knowing this information is useful as then users can adapt their trading strategies to increase their profit returns.

Since the preprocessed data is encoded and scaled, we will need to reverse this process for these explanations to understand the problem in terms of the original data. These explanations may vary for different machine learning algorithms, so it is best to run these through other models to see whether there is agreement or disagreement. Additionally, some of the explanations may seem counter-intuitive to reality and thus users are urged to constantly question the information they are receiving.

Post-modelling explainability

At the end of the ML process, evoML deploys the best machine learning model for the objective at hand, with the source code available to download for ultimate transparency. Once deployed, there are continual checks that are being done on the model in order to sustain its quality:

  1. Data drift detection: This helps reduce model degradation (by monitoring statistical properties of the predictors as more data is provided).
  2. Concept drift detection:  This helps reduce model degradation (by monitoring statistical properties of the target as more data is provided).
  3. Intelligent alerts: This helps alert users of overall model performance in terms of accuracy, computation time etc.

These processes are a continuation of explainability and performance checks to make sure that consumers can monitor their own business objectives against the deployed model.

Build AI that you can trust with evoML

Through advanced code optimisation of hundreds of models and hundreds of their parameters, users can quickly adapt and trial new approaches that will accommodate their needs. evoML strives for a balance between AI autonomy and human authority, such that we can assist with:

1) Preventing discrimination/bias and promoting fairness.

2) Business decision-making through clarity and diligence.

3) Supporting society’s welfare through reliable and high-quality recommendations.

We hope this article helps you better understand XAI and allows you to use these concepts in your own projects so that you can truly understand the hidden connections between data and machine learning models.

About the Author

Siddartha Nath ​| TurinTech Research

Mathematics Graduate, passionate about Data Science, Finance and Education.

When not involved in academia, I enjoy socialising, playing sports and performing arts.

Unlock the full potential of
your code and data with AI.

Contact Us

© 2023 · TurinTech AI. All rights reserved.

This is a staging enviroment