The world of technology, like any other, is not immune to fads. And these fads cause certain words and concepts to be used arbitrarily, like simple marketing hollow words, which in the end lose substance and validity from misusing them. So every time there is a technology on the rise, certain buzzwords are generated that everyone uses and that you cannot stop listening to and reading everywhere.
Without a doubt, the most cutting-edge technological trend of recent years is everything related to artificial intelligence and data analysis. And it is that relatively recently there have been great advances in this field, which together with the availability of enormous amounts of data and increasing computing power are giving rise to all kinds of very interesting practical applications.
The problem comes when the terms related to the field become marketing empty words that in many cases are outright lies. It is very common to talk that this or that product uses artificial intelligence to achieve something and, sometimes, they are conventional algorithms making predictable decisions.
What is Artificial Intelligence?
Artificial intelligence (AI) was born as a science many years ago when the possibilities of computers were really limited, and it refers to making machines simulate the functions of the human brain.
AI is classified into two categories based on its capabilities:
- General (or strong) AI: that tries to achieve machines/software capable of having intelligence in the broadest sense of the word, in activities that involve understanding, thinking, and reasoning on general issues, on things that any human being can do.
- Narrow (or weak) AI: which focuses on providing intelligence to a machine/software within a very specific and closed area or for a very specific task.
Thus, for example, a strong AI would be able to learn by itself and without external intervention to play any board game that we “put before it”, while a weak AI would learn to play a specific game like chess or chess. Go. What’s more, a hypothetical strong AI would understand what the game is, what the objective is, and how to play it, while the weak AI, although it plays Go better than anyone else (a tremendously complicated game), will not really have a clue what it is doing.
One of the crucial questions when it comes to distinguishing an artificial intelligence system from mere traditional software (complex as it may be, which brings us to the jokes above) is that AI “programs” itself. That is, it does not consist of a series of predictable logical sequences, but rather they have the ability to generate logical reasoning, learning, and self-correction on their own.
The field has come a long way in these years and we have weak AIs capable of doing incredible things. Strong AIs remain a researcher’s dream and the basis of the scripts for many science fiction novels and films.
What is Machine Learning?
Machine Learning (ML) or machine learning is considered a subset of artificial intelligence. This is one of the ways we have to make machines learn and “think” like humans. As its name suggests, ML techniques are used when we want machines to learn from the information we provide them. It is analogous to how human babies learn: based on observation, trial, and error. They are provided with enough data so that they can learn a certain and limited task (remember: weak AI), and then they are able to apply that knowledge to new data, correcting themselves and learning more over time.
There are many ways to teach a machine to “learn”: supervised, unsupervised, semi-supervised, and reinforcement learning techniques, depending on whether the correct solution is given to the algorithm while it is learning, it is not given the solution, it is Sometimes you give or are only scored based on how well or poorly you do, respectively. And there are many algorithms that can be used for different types of problems: prediction, classification, regression, etc …
You may have heard of algorithms such as simple or polynomial linear regression, support vector machines, decision trees, Random Forest, K nearest neighbors … These are just some of the common algorithms used in ML. But there are many more.
But knowing these algorithms and what they are for (to train the model) is just one of the things that need to be known. Before it is also very important to learn how to obtain and load the data, do an exploratory analysis of the same, clean the information … The quality of the learning depends on the quality of the data, or as they say in ML: “Garbage enters, garbage comes out”.
Today, the Machine Learning libraries for Python and R have evolved a lot, so even a developer with no knowledge of mathematics or statistics beyond that of the institute, can build, train, test, deploy and use ML models for applications of the real world. Although it is very important to know all the processes well and understand how all these algorithms work to make good decisions when selecting the most appropriate for each problem.
What is Deep Learning?
Within Machine Learning there is a branch called Deep Learning (DL) that has a different approach when creating machine learning. Their techniques are based on the use of what are called artificial neural networks. The “deep” refers to the fact that current techniques are capable of creating networks of many neural layers deep, achieving unthinkable results a little more than a decade ago, since great advances have been made since 2010, together with large improvements in computing power.
In recent years Deep Learning has been applied with overwhelming success to activities related to speech recognition, language processing, computer vision, machine translation, content filtering, medical image analysis, bioinformatics, drug design … obtaining results equal to or better than those of human experts in the field of application. Although you don’t have to go to such specialized things to see it in action: from Netflix recommendations to your interactions with your voice assistant (Alexa, Siri, or Google assistant) to mobile applications that change your face … They all use Deep Learning to function.
In general, it is often said (take it with a grain of salt) that if the information you have is relatively little and the number of variables that come into play is relatively small, general ML techniques are best suited to solve the problem. But if you have huge amounts of data to train the network and there are thousands of variables involved, then Deep Learning is the way to go. Now, you must bear in mind that the DL is more difficult to implement, it takes more time to train the models and it needs much more computing power (they usually “pull” GPUs, graphics processors optimized for this task), but the problems are usually more complex as well.
What is Big Data?
The concept of Big data is much easier to understand. In simple words, this discipline groups the techniques necessary to capture, store, homogenize, transfer, consult, visualize, and analyze data on a large scale and in a systematic way.
Think, for example, of the data from thousands of sensors in a country’s electrical network that send data every second to be analyzed, or the information generated by a social network such as Facebook or Twitter with hundreds (or thousands) of millions of users. We are talking about huge and continuous volumes that are not suitable for use with traditional data processing systems, such as SQL databases or SPSS-style statistics packages.
Big Data is traditionally characterized by 3 V:
- The high volume of information. For example, Facebook has 2 billion users and Twitter about 400 million, who are constantly providing information to these social networks in very high volumes, and it is necessary to store and manage it.
- Speed: following the example of social networks, every day Facebook collects around 1 billion photos and Twitter manages more than 500 million tweets, not counting likes and many other data. Big Data deals with that speed data receiving and processing so that it can flow and be processed properly without bottlenecks.
- Variety: the infinity of different types of data can be received, some structured (such as a sensor reading, or alike ) and others unstructured (such as an image, the content of a tweet, or a voice recording). Big Data techniques must deal with all of them, manage, classify, and homogenize them.
Another of the great challenges associated with the collection of this type of massive information has to do with the privacy and security of said information, as well as the quality of the data to avoid biases of all kinds.
As you can see, the techniques and knowledge necessary to do Big Data have nothing to do with those required for AI, ML, or DL, although the term is often used very lightly.
These data can feed the algorithms used in the previous techniques, that is, they can be the source of information from which specialized models of Machine Learning or Deep Learning are fed. But they can also be used in other ways, which leads us to …
What is Data Science?
When we talk about data science, we refer in many cases to the extraction of relevant information from data sets, also called KDD ( Knowledge Discovery in Databases, knowledge discovery in databases). It uses various techniques from many fields: mathematics, programming, statistical modeling, data visualization, pattern recognition, and learning, uncertainty modeling, data storage, and cloud computing.
Data science can also refer, more broadly, to the methods, processes, and systems that involve data processing for this extraction of knowledge. It can include statistical techniques and data analysis to intelligent models that learn “by themselves” (unsupervised), which would also be part of Machine Learning. In fact, this term can be confused with data mining (more fashionable a few years ago) or with Machine Learning itself.
Data science experts (often called data scientists ) focus on solving problems involving complex data, looking for patterns in the information, relevant correlations, and ultimately, gaining insight from the data. They are usually experts in math, statistics, and programming (although they don’t have to be experts in all three).
Unlike experts in Artificial Intelligence (or Machine Learning or Deep Learning ), who seek to generalize the solution to problems through machine learning, data scientists generate particular and specific knowledge from the data from which they start. Which is a substantial difference in approach, and in the knowledge and techniques required for each specialization.