Machine learning can be used to address different types of problems. These can be grouped into categories according to the kind of technique with which their resolution is undertaken.
This article aims to give you an overview of machine learning paradigms and the types of problems they are commonly used for.
Machine Learning Paradigms
As a general rule (there are exceptions), machine learning algorithms build a model representing the knowledge they have been able to extract from the data provided as input. Depending on the additional information supplied to the algorithm, we can differentiate between different paradigms to guide the learning process. Below I briefly describe the best known:
- Supervised learning. It consists of indicating to the algorithm, as it learns if the output it has generated for a particular case (the prediction) is correct or not. The most common action is for the algorithm to adjust the model it generates each time it is told that it has made a mistake to improve its predictions.
- Unsupervised learning. The only information that is delivered to the algorithm is the data samples without further details. From these samples, it is possible to analyze the distribution of the values, the similarity or distance between the models, the degree of concurrence of some variables with others, etc. The applications are multiple, as we will see later.
- Semi-supervised learning. It is a case halfway between the previous two. From the available data set, the correct output is known only for some samples. The algorithm uses them to build an initial model that, later, provides a forecast of the output value for the rest of the pieces. In this way, the model is expanded and adjusted, taking advantage of the available information.
- Reinforcement learning. The algorithm to which the data is provided is not supplied with the accurate outputs to adjust its model, as is the case in the supervised point. Still, it is awarded a more or less significant prize depending on how well the sequence of actions is carried out. In this way, the behaviour is reinforced towards the objective pursued.
These paradigms allow specific types of problems to be solved and implemented using different tools: the models that represent knowledge. Depending on the chosen model: a tree, a neural network, a set of rules, etc., a specific algorithm will be used to generate and fit it.
Types of Problems in Machine Learning
Machine learning is used to solve a wide range of real-life problems. These problems, or tasks as they are also known, can be categorized into a few types. Although it is not a strict rule, each situation is usually addressed through a specific learning paradigm. For this reason, the most common types of tasks are outlined below according to the paradigm with which it is traditionally approached.
Supervised learning Tasks
There are two fundamental types of problems that are solved by supervised learning, described below. The actual outputs, known in advance for the data, will allow the algorithm to improve its model parameters. Once the teaching or training of the model is completed, it will be able to process new samples and generate the appropriate output without any help.
- Classification. Each data sample has associated one or more nominal outputs, called class labels, labels, or simply class. To automatically classify, a predictive model is created, to which, by delivering the input variables, it generates the corresponding class labels as output. A classifier can be used to process credit or risky loan applications, differentiate incoming email messages as spam or essential, find out whether or not a person’s face appears in a photograph, etc.
- Regression. As in the previous case, each sample also has an associated output value, but in this case, it is of an objective type (continuous, not discrete, that is, with possible results within a continuum), so the techniques used to generate the model are usually different from those used for classification. However, the procedure for fitting or training the model is similar: known accurate outputs are used to correct its parameters and improve prediction. With a regression model, it is possible to determine the height of a person based on their sex, age and nationality, or to predict the distance that will be able to travel a transport taking as input variables the weight of the load, the volume of fuel available and the ambient temperature.
Unsupervised Learning Tasks
As indicated above, the types of problems faced with this learning paradigm are characterized because the data samples only have the input variables. There is no way out to predict that can guide algorithms. Therefore, the models generated, if they exist, are not predictive but descriptive. The most common tasks are:
- Grouping. Analyzing the similarity/dissimilarity of the data samples, for example, calculating the distance they are from each other in the space generated by the values of their variables. Several disjoint groups are created. This technique, also known as clustering, facilitates visual data exploration and can be used as a primary classification method when the required class labels are not available to generate a classifier.
- Association. The search for associations between specific values of the variables that make up the samples is carried out by looking for the concurrence between them, that is, by counting the times they appear simultaneously. As a result, this type of problem can generate a set of association rules, a technique widely used in all kinds of electronic and physical businesses to arrange their products or recommend them.
- Variable reduction. By analyzing the distribution of the values of the variables in the set of samples, it is possible to determine which of them provide more information, which is correlated with others and therefore are redundant, or whether it is possible to find an underlying statistical distribution that generates these data, which would simplify its original representation. There are many possible techniques in this type of task, from the selection and extraction of variables to manifold learning, consisting of finding the aforementioned underlying distribution.
Other Types of Learning Tasks
A vast majority of the problems addressed through machine learning fall into the categories listed in the previous two sections. However, there are other types of tasks that require different approaches. An example would be optimization problems in general, of which perhaps the best-known exponent is the travelling salesman. This task consists of finding the shortest itinerary to visit in cities. When n is enormous, the problem becomes unapproachable to the exhaustive search: evaluating all the possible alternatives to determine the best one.
There are many other cases within this category, and the difficulty is usually always the same: the optimal point is not known, so it cannot be known whether a potential solution is more or less good, and the number of possible solutions, or steps to reach them, it is enormous. There are two categories of techniques that are commonly applied to deal with these problems:
- Bio-inspired algorithms. This group includes genetic algorithms, evolutionary strategies, optimization based on particle systems, etc. All of them start from the same concept: reproduce mechanisms existing in nature such as evolutionary selection in living beings, the behaviour of flocks of birds, colonies of ants, etc. Thanks to them, it is possible to find an acceptable solution to the optimization problem in a reasonable period.
- Reinforcement learning. This paradigm, described at the beginning of the section, can also be applied to optimization problems, although in recent times, it has gained notoriety for its success in learning to play and win certain games.