How Can Complex Model Code Run Fast? Trade-off between model complexity and running speed

Two factors influencing the speed of models

The machine learning models usually play the role of a function F(a, X) given a set of coefficients a and a dataset X, i.e. the input of the function. When a model’s coefficients are determined, the form of the function F(a, X) is determined. In Machine Learning, the learning process is to calculate the function coefficients such that F(a, X) resembles the ground-truth of the underlying pattern in the data. Some models also have adjustable hyper-parameters that can change the form of the function or the learning process.

Let’s look at the Linear Regression model

class sklearn.linear_model.LinearRegression(*, fit_intercept=True, normalize=False, copy_X=True, n_jobs=None, positive=False)

and the Multi-layer Perceptron Classifier model.

class sklearn.neural_network.MLPClassifier(hidden_layer_sizes=100, activation='relu', *, solver='adam', alpha=0.0001, batch_size='auto', learning_rate='constant', learning_rate_init=0.001, power_t=0.5, max_iter=200, shuffle=True, random_state=None, tol=0.0001, verbose=False, warm_start=False, momentum=0.9, nesterovs_momentum=True, early_stopping=False, validation_fraction=0.1, beta_1=0.9, beta_2=0.999, epsilon=1e-08, n_iter_no_change=10, max_fun=15000)

‍

Figure 1_ Classification Model vs Regression Model. Image by Joshua Ebner

‍

Model complexity

For the Linear Regression Model, it is known that the function form is: F(a, X) =a_0 + a_1*X_1 + … +a_d * X_d, where d is the number of features. The learning process is to calculate coefficients a_0, …, a_d. On the other hand, for the Multi-layer Perceptron Classifier, there is no specific function form as it is the aggregation of many unit functions and complex interconnections among them. Therefore, the weights of the links between neurons are also coefficients to be determined. As the internal structure of the model becomes larger and more complex, there are more coefficients to be determined, which can significantly impact the speed of the learning process. Similar trends can be observed in other models. In short, when the model structure becomes more complex, it takes a longer time to train the model and get a prediction from it.

Coefficient calculation method

Another factor affecting the speed is the coefficient calculation method. Only limited models have the luxury of the coefficients being solved directly from the given X and Y. For example, we can solve the coefficients of a linear regression model by applying the ordinary least square method. However, when training the Multi-layer Perceptron classifier, the coefficients are searched via a gradient-based method until the target (output of the loss function) converges. Hyper-parameter learning_rate controls the search step and hyper-parameter max_iter controls the maximum length of the search process, both of them can significantly impact the learning speed. In summary, models having a direct method for solving their coefficients are generally much faster than those that don’t, and for different models, their hyper-parameters may also impact the speed of solving the coefficients.

Considerations when selecting models

Accuracy vs Speed

Simple models are easy to use and give a quick response on request while complex models require much more time to do the same. From the latest papers in the area of AI, we can conclude that most complex models are quite popular in research, but we still see that other simpler models are preferred in industry. Most models used by businesses are still regression and tree-based models (mainly because of their simplicity, speed, size, and the ability to explain them easier) but deep learning models (if not overfitted) are getting more and more popular.

The underlying reason for the development of deep learning models is that, in many cases, complex models do outperform simple models, especially for tasks such as Natural Language Processing, Image Classification, Timeseries Prediction. However, finding a trade-off between the performance and speed is very important and selecting a model should be considered on a case-by-case basis depending on the business needs.

Figure 2_ scenarios where deep learning models perform better. Image by Jesse Moore.

Imagine an AI model that identifies fake news or hate speech. Such a model may be used on millions of messages generated on a social network every second. Having a very accurate model means that it’s most probably a complex one – which in turn, is very slow. A slow model in production may introduce such a huge load to the system that makes it impractical. Thus, there is a need to balance accuracy and speed.

Another example of how important it is to find a good trade-off is self-driving cars (e.g. Tesla). Such cars are using a set of machine learning models that take complex decisions instantly. Those models need to be both fast and accurate as any delayed or wrong prediction can easily cause an accident. That’s why there is a huge amount of money and resources invested in how to improve those models both for accuracy and speed, and even the slightest improvements can have a huge business impact.

Cost of training & retraining models

Training very complex machine learning models can be very costly. A recent survey showed that training a large model with billions of parameters for better accuracy requires considerable computation time thus computing power and thus cloud costs. Many attempts are made to reduce the training time but It is quite common that it takes days or weeks to train a deep learning model to achieve good results. AlphaGo, the model built by Google to beat the champion of the Go game (a more complex game than chess), cost millions in cloud resources to be trained.

In some scenarios, the time a model should be retrained is critical while at the same time a quick response from the model is required. For instance, in trading execution tasks, some models should be updated very frequently (every minute in some extreme cases) with the latest trade data and the response time of the model should be in milliseconds. The deployment of complex models in such tasks is not practical. In a recent paper conducted cooperatively by academia and industry about how deep learning models are applied in finance, it was shown that supercomputers were used to make the training time practical (acceptable). However, such deployment is not always affordable to every business for their regular tasks.

In conclusion, when selecting the best machine learning model for a business problem, one has to make trade-offs between the performance of the model, model speed, explainability, and how often it needs to retrain such models.

Better model selection with evoML

Our evoML platform provides flexible model selection criteria to satisfy the different business needs. Users can select the best models based on a particular performance metric, including the accuracy, precision, F1 score, etc. evoML also provides:

the most popular performance metrics for the classification task and the regression task
the training time and predicting time together with other metrics such as explainability, the energy consumption of a model and an estimated cost for a generated model.

All these metrics allow users to choose the model that best suits their business needs. In addition, evoML platform has a unique advantage in that it enables multi-objective optimisation (joint selection criteria), which allows users to make decisions based on the trade-offs among multiple metrics, for example, accuracy and speed. Powered by our award-winning Genetic based and other algorithms, evoML generates the best models and ranks them based on users’ custom criteria. The user can then choose the one that is more suitable for their business problem. Figure 3 is an example where each green dot denotes an optimal model in the Pareto front line. The whole model selection process is customisable, visible, and easy to use.

Automatic code optimisation for high model speed

As we mentioned previously, many times it’s impossible for businesses to compromise accuracy for the speed of the model. Thus, a lot of research and investment has been put into making complex machine learning faster and more practical. The three most common approaches to achieve better model speed are:

Significant improvements in the hardware technology with better CPUs and new GPUs, TPUs, etc.
New model compression techniques and libraries that try to compress the internal neurons of complex deep learning networks.
Code optimisation techniques that focus on identifying inefficiencies in code and adapting it accordingly so it can better utilise the hardware a model is used on.

‍

‍

With TurinTech’s evoML, user are able to further optimise the execution time, memory usage and energy consumption of the model, automatically. Our approach detects inefficiencies in the code that may affect performance, and are difficult and time-consuming to find even for the most experienced engineers. It distinguishes itself through optimising source code systematically for desired overall performance. In addition, it provides the optimised source code with an explanation of what changes were performed. Our optimisation approach is based on TurinTech.ai 10 years of continuous scientific research in the area of Genetic algorithms and Search-Based software engineering and it has achieved significant optimisation results on hundreds of codebases.