Questions about Deep Learning and the nature of knowledge

If there is something that can be assumed as a fact in the AI and Machine Learning domain is that the last years had been dominated by Deep Learning and other Neural Network based techniques. When I say dominated, I mean that it looks like the only way to achieve something in Machine Learning and it is absorbing the great part of AI enthusiasts’ energy and attention.

This is indubitably a good thing. Having a strong AI technique that can solve so many hard challenges is a huge step forward for humanity. However, how everything in life, Deep Learning, despite being highly successful in some application, carries with it several limitations to that, in other applications, makes the use of Deep Learning unfeasible or even dangerous.

Some Limitation of Deep Learning Approaches

Just after I started writing this article, I’ve found this very precise and in-depth article by Gary Marcus about Deep Learning limitations. The article is good, and I will not repeat its contents here. I will never reach the level of accuracy of the original article. If you want a technical analysis of Deep Learning limitations, go read that article! Here, instead, I want to talk and discuss a more philosophical problem of Deep Learning. And I do not to give answers, but to raise questions, because asking ourselves questions it always a good practice in science.

To talk about the topic in this blog post title, first we need to talk about two relevant Deep Learning’s weak points. The first one is that, even if we call Deep Learning “deep”, a Deep Learning algorithm is very “shallow”. There is no deepness in a Deep Learning application from the point of view of “intelligence” in general. Once we train the software, any perturbation to the original problem may completely invalidate all the training. This concept is called “knowledge transfer”, that is the ability to apply previously acquired knowledge to other similar or related problem. Deep Learning is, in practice, very limited on this front.

This indicates that Deep Learning systems can solve problems, but they do not “understand” the problem. Here we can raise the first “deep” question about intelligence: what is the nature of “understanding”?

The second weakness is probably a consequence of the first one. Deep Learning is completely not transparent. This mean that we can easily verify if the answer is right or not, but at the same time we have no chance to know why the system took a certain decision. While this is not a problem in many cases (we don’t care about the complex reasoning for which Google Photo is able to identify that we have a cat in a photo, at least for the final user). However, this may be a big issue for other application. If the application is very complex, transparency may be vital for debugging purposes. If we are designing medical diagnostic software, transparency is very important in order to allow other doctors (or other software) to understand why a certain diagnosis has been made. Or because, as we have seen previously, a non-transparent machine learning system can lead to serious social bias (how we can know if a certain decision has been made for legitimate reasons or because input data contains some subtle bias?)

The problem here is that Deep Learning systems are learning a model of the world that we don’t know, and that they cannot explain to us. In some cases, this is an advantage. Imagine if we needed to consciously process eye signals! It would be a mess! We would spend a lot of brain power just to understand the objects we have around us. The fact that all this computation is done automatically by the neural networks in our brain is perfect. At the same time, this is not the best approach for every possible problem.

This is not only a practical problem, it is a phylosophical one. We can solve problems using Deep Learning and, at the same time, we can still not understand them. This situation it is unprecedented at this scale in the evolution of human knowledge.

Why should I care?

Every time I raise concerns on Deep Learning, Deep Learning people start getting defensive. Sometimes I get interesting opinion on some Deep Learning weakness, but other times I get a simple objection: “Why should we care?”

Well, that’s a completely reasonable objection. After all, if you work with Deep Learning and it works for you, you are right! You don’t have to care if Deep Learning is transparent or not, or if Deep Learning “is really intelligence” or not. Who cares! I don’t care if my PC is an intelligent being or not when I need to use a spreadsheet! I just need a tool to put data into tables.

But if you love Artificial Intelligence and the future implications of it, even if you love Deep Learning, it is important that you periodically ask yourself those question. First, because science advances especially when people start questioning the “dominant” way of solving problems. Second, because it is important for humanity to grow “understanding” about problems and not just solve them. Third, because when we will reach the point in which we can no more solve a problem throwing GPUs at it, it would be interesting to have a plan B.

The nature of understanding

The desire of understanding is human. That’s why we do science. On the computer science world, that’s why we dissect software, we reverse engineer programs, and we hack systems. Therefore, you can imagine why I am so intimately unsatisfied when a Deep Learning algorithm “solves” a problem. I am fascinated by the solution, but the fact that solution adds nothing to my understanding of the problem, it makes me feel sad. I feel robbed of the understanding.

I lurk a lot in mathematic communities. Some time ago I asked if they would like to have a machine that is able to solve any theorem and conjecture. They say yes. Of course. Then I added a condition. The machine will never tell them why. You put a theorem in the machine and the machine just return true, false or undecidable (I know it is not possible, but for the sake of reasoning…). They all said that such a machine would be almost useless. Almost, an insult.

This explain why there is value in the understanding. This is not an objective value, it is a very human value, but it is there. Sometime, solving a problem is not enough. We need to know why.

Computer-Aided Proofs and Boolean Pythagorean Triples

Deep Learning and other not-transparent machine learning techniques are not the only affected by this problem. Even symbolic reasoning, a notorious transparent high-level AI approach, can fall in the same pitfall.

Again, the mathematical community is still at the center of the debate. Computer-aided mathematical proofs are increasingly common. But what if your computer-aided proof is a 200 Terabyte blob of symbolic chains and brute-force testing?

When in 2016 the proof came out, it relit the debate. The problem that has been proved is the Boolean Pythagorean triples problem. You take a Pythagorean triple ($latex a^2 + b^2 = c^2$) and you want to prove that is possible to “color” red or blue each integer such that there is no Pythagorean triple with $latex a$, $latex b$ and $latex c$ of the same color.

The answer is simple: if you allow integers greater than 7824, the answer is no. Otherwise there are many solutions. It is proved. It is the answer to our problem.

Figure 1. One of the solution for number smaller than 7824. Each number is colored red or blue (number are arranged as a rectangle and white numbers can be either blue or red).

But: is that a prof? This is not a trivial question. Many mathematicians are still discussing if a proof that no man can ever read can be really considered a proof. Moreover, the gargantuan proof does not explain why 7824 is such a special number. This is very unsatisfactory for a mathematician.

Moreover, this adds another point. In the Boolean Pythagorean Triples proof, we are completely trusting the machine. Because no man will ever be able to check the proof, we are, in fact, giving our trust to a math oracle. This is a hard bite to shallow for the math community.

This “trust problem” is still more interesting going back to Deep Learning and Machine Learning. Boolean satisfiability solvers like the Conflict-Driven Clause Learning algorithm used in the above proof, are easy to trust. They use simple logical rule that combine in a predictable way. Of course, we must keep an eye on bugs, but logical inference is a quite trustworthy software.

Neural Networks, on the other hand, are mostly black boxes. How can we trust them? Again, in many applications, this does not matter. If a neural network does a poor job in recognizing lizards, or words, we can easily spot the error and move one (and improve the algorithm). However, if we apply Neural Networks to very fuzzy environment, such as deciding jobs and influencing people lives… well, I would be more careful. Remember that Machine Learning is not unbiased; it can absorb and amplify human biases very well.

And so?

I am not here to provide solutions. This is just a (too much) long mumbling on the topic. I hope to have raised some question. About the answers, a blog post is not the place for them.

There are just a couple of things I hope we will se in the future. First, that we continue researching for other kind of Machine Learning and Artificial Intelligence approaches. Even if this is not the cooler AI topic on the field, it is important that we continue pushing in other directions.

Second, I am strongly convinced that a solution to all these problems, a sweet point between Neural Networks and numerical Machine Learning and Symbolic AI would be the “singularity” AI is looking for decades. We may reach this point from the Neural Network direction, producing new algorithm that are able to “talk about themselves” in addition to solve the problem. Or we may reach it by going from the symbolic direction, producing logical system that are truly robust respect to the noisy and uncertain real world. Or we may reach it from a still unknown direction.

The important thing, is keep pushing.

Some Limitation of Deep Learning Approaches

Why should I care?

The nature of understanding

Computer-Aided Proofs and Boolean Pythagorean Triples

And so?

Not every classification error is the same

How hidden variables in statistical models affect social inequality