Machine Learning techniques increasingly prove to be helpful in different businesses and sectors. However, applying them in organizations does not consist of developing and training models but also in a series of previous and subsequent steps related to the definition of the use case and the target. The monitoring, once put into production and associated considerations, with its interpretability and possible biases.
Industrialization, Traceability and Verifiability In Machine Learning
In the first place, it started from the premise that, when implementing Machine Learning models, especially in the banking sector, “we need the models to be traceable, reproducible and verifiable”, as well as industrialized.
This industrialization makes it possible to standardize the processes that usually occur in all Machine Learning projects, to be agile while guaranteeing the three aspects mentioned above and reducing the cost of maintenance of the models.
The expert gave an example: “at the bank, we have to be able to answer why a person was denied a loan, tracing the path from the data to the score issued by the model.” To do this, it is necessary to know which version of the model is in production and what data was used or where the predictions were stored. Several versions of data are usually saved, associated with the models to cover the traceability and reproducibility part. Those are in production at all times.
On the other hand, verifiability is handled by a committee in which different bank areas intervene ( model owner, risks, legal, etc.). The Machine Learning model cannot go into production if the committee does not approve it. In addition, other business decisions are made: decision thresholds, when to launch or when to retrain the model. Check out this Best Machine Learning Course, taught by industry experts who have mastered this domain and have many years of experience in the industry.
Analysis and Design of The Machine Learning Model
As Experts explained, the design and development of a Machine Learning model are governed by a series of requirements: that it be simple, monitorable, interpretable, that it is not biased, that the input variables comply with the regulation and that it is adjusted to the case usage and operational restrictions.
All this means taking into account some aspects and addressing some challenges in the different phases of the process:
- Definition of the use case in which different areas are involved. Several fundamental questions are answered for the development of the model: what variables and what samples can be used, if there are legal restrictions that limit the use of the model, if the model is going to work in batch mode or real-time, as well as the technology necessary for it.
- According to the expert, the analysis of the target population is one of the phases that takes the longest. First, it is necessary to decide on which population the model is going to train and which one will be applied, with the possibility that it has not been historically dealt with. Then the availability of variables is studied, and the target is defined, which must be aligned with business and risks in terms of criteria, among other things.
- Data splitting or data division in the train, test and validation sets. It is decided how to make the cuts (temporarily, grouped or stratified), always keeping in mind that they are compatible.
- Possible preselection of variables. Although the selection of variables is still made on the training data, it is possible to make a distributed preselection to reduce the volume of data.
- Model training and predictions. Openbank has its flexible Auto-ML tool to adapt to the variety of use cases that are addressed. Here you have to know how to adjust the parameters to ensure traceability and reproducibility and avoid black boxes.
- Interpretability, for which they also have their tool. Once the model has been trained, an attempt is made to answer and explain, for example, why a particular score has been assigned to a client. In addition, this same tool can be applied to models that have not been implemented.
- Monitoring, of two types: the classic one that does business with its KPIs to make a standard follow-up of the improvements in the industry or, from a more technical point of view, aimed at measuring the so-called data shift.
- Possible biases. According to the expert, they can no longer afford to develop biased models, and she believes that it is necessary to define, from company policy, what type of fairness is to be achieved, using various strategies to maximize profit with restrictions.
As we can see, a Machine Learning project in the company cannot be limited to developing and training a helpful model. It is necessary to attend to a series of considerations before and during the process: for example, that the models fit the objective, but that they can also be generalized to be more efficient or not lose sight of legal or ethical issues.