Mastodon Icon GitHub Icon LinkedIn Icon RSS Icon

Marginalia: Rebooting AI by Gary Marcus and Ernest Davis

With this new year, let’s try a new format. Marginalia will be a series in which I’ll share notes and comments on interesting books I read. The name is directly inspired by the old word indicating the small notes on the margins of books.

It will be a chance to discuss my readings without the need to write a full-fledged article. I hope it will be interesting as a review of the book or as a discussion starter. So, let’s start.

The Book

In the first Marginalia article, I’ll comment on my recent reading of Rebooting AI: Building Artificial Intelligence We Can Trust by Gary Marcus and Ernest Davis.

The book is an in-depth discussion on the limitations of the current deep-learning focused research on AI. As you know, this is a topic I really feel mine. I’ve already talked extensively about the deficit of deep learning and how, even if it is useful, it is currently chocking more interesting AI research.

Here, the authors support my same thesis, but of course, they do that in a more informed way, with a lot of references and interesting points of view. It was definitely worth reading, even if at some points (especially near the end) the book seems to start repeating itself too much and it definitely loses focus.


Now let’s see my main highlights and notes from this book.

One reason that people often overestimate what AI can actually do is that media reports often overstate AI’s abilities, as if every modest advance represents a paradigm shift. - Page 8

This is a general problem, and it is not related to Deep Learning. Not only, but this also is not an exclusive problem for AI. Media thrives on exaggerations. My rule of thumbs is to consider untrue every general public article on AI. Until now, I have never been wrong.

Much of this recent success in AI has been driven largely by two factors: first, advances in hardware, which allow for more memory and faster computation, often by exploiting many machines working in parallel; second, big data, huge data sets containing gigabytes or terabytes (or more) of data that didn’t exist until a few years ago, such as ImageNet, a library of 15 million labeled pictures that has played a pivotal role in training computer vision systems; Wikipedia; and even the vast collections of documents that together make up the World Wide Web. - Page 10

This is a common argument used to attack Deep Learning. While it is true, I often feel that this is a bad argument. During human history, a lot of innovations have been driven by the advance in technical capabilities and the amount of information available. This sounds to me like “the discovery of Jupiter’s moons has been driven by the advances in glass crafting and quality of lenses.” Duh.

Luckily, the authors are smart enough to describe this in a way that sounds more like the description of a fact than a tentative to diminish the importance of Deep Learning. What is essential in this case is that we need to understand that Deep Learning revolution is not a new paradigm of AI but a revival of an old approach. This is an entirely noble way to advance a topic, but not everyone is aware of it.

Some of the world’s best minds in AI, using some of the biggest clusters of computers in the world, had produced a special-purpose gadget for making nothing but restaurant reservations. - Page 14

This is about the Google demo in which the assistant get a restaurant reservation via phone. I remember the newspaper got really crazy for that while, but after some time everything appeared for what it was: pretty meh. Personally, I was not impressed from day one. Not because it is not impressive per se but because I know the trick and mirages of very tailored demos.

We tend to overgeneralize intelligence. If we see something very human-like in a limited field, we unconsciously assume that the algorithm is capable of maintaining the same level of intelligence for everything else.

However, Deep Learning is a useful tool that is very far away from even a basic form of “intelligence.” And, in fact, the Google demo remained a demo because it was tough to adapt to the general case.

The authors used a too harsh phrase to describe a really valid point.

What’s missing from AI today—and likely to stay missing, until and unless the field takes a fresh approach—is broad (or “general”) intelligence. - Page 15

See above.

By and large, in contemporary research in AI, robustness has been underemphasized, in part because most current AI effort goes into problems that have a high tolerance for error, such as ad recommendation and product recommendation. If we recommend five products to you and you like only three of them, no harm done. But in many of the most important potential future applications of AI, including driverless cars, elder care, and medical treatment planning, robustness will be critical. Nobody will buy a home robot that carries their grandfather safely into bed four times out of five. -p22

This is another interesting point: error tolerance. Many Deep Learning applications are widely used because the common errors are acceptable, even fun sometimes. If my mother gets misclassified as a bird in an automated photo library software, in the worst case, we all have a good laugh. But if my mother gets thrown out of the window by a caregiver robot, I assure you I would not laugh about that!

One of the central gap from them to “what people imagine when we talk about AI” is precisely there: when even a single error is unacceptable, an autonomous statistical model cannot be accepted. Not with the unpredictability and the error rate of current Machine Learning approaches.

We will also suggest that a current obsession with building “blank slate” machines that learn everything from scratch, driven purely from data rather than knowledge, is a serious error. - p25

I think the same. I do not understand why researchers insist on building Machine Learning systems starting from the blankest of state and using a single super-noisy sensor. In some cases, there are reasons for that, but in the general case, it is just flexing. In practical applications, I do not care about how many sensors and how much pre-built training there are. What matters to me is that the system works.

Deep learning is largely falling into the same trap, lending fresh mathematics (couched in language like “error terms” and “cost functions”) to a perspective on the world that is still largely about optimizing reward, without thinking about what else needs to go into a system to achieve what we have been calling deep understanding. - p119

It is possible to argue that “deep understanding” may be the emergent behavior of some convoluted ensemble of optimization algorithms. Yes, it is possible, and it may also be right at a deeper level. However, to build a computer, we do not start from the quantum physic of electromagnetism. We abstract the laws, and we use a more high-level model to make something useful. Even if a transistor “may be derived by the quantum behavior of individual electrons.”

So, yes. Human intelligence may be an emergent behavior such as transistors’ behaviors (or even time itself). Still, if we want to produce something really useful, we may want to start exploring other solutions. For now, we just throw more data into a neural network, and we hope for the best.

The brain is a highly structured device, and a large part of our mental prowess comes from using the right neural tools at the right time. We can expect that true artificial intelligences will likely also be highly structured, with much of their power coming from the capacity to leverage that structure in the right ways at the right time, for a given cognitive challenge. […]
Ironically, that’s almost the opposite of the current trend. In machine learning now, there is a bias toward end-to-end models that use a single homogeneous mechanism with as little internal structure as possible. -p124

To be honest, machine learning, especially deep learning, assumes that internal structure emerges from the network and the data. In this sense, the structureless mesh of elementary elements is then self-structured under the force of data and training. There are also a lot of research directions on ensembles of neural networks, each of which specialized with a specific subproblem.

Yes, they all share the same principles, but saying that Deep Learning is not structured is a weak argument.

The real advance in AI, we believe, will start with an understanding of what kinds of knowledge and representations should be built in prior to learning, in order to bootstrap the rest. -p146

I strongly agree. In my opinion, the singularity of AI will be opened by whoever will be able to build a minimal model of knowledge representation required for “true learning” and “deep understanding.”

Comprehending the uses (and dangers) of a knife is not about accumulating lots of pictures, but about understanding—and learning—causal relationships.

Symbolic understanding is also useful for transferring learned info. If I say that a new tool “works like a knife,” you instantaneously create a link between know-hows, and you already have an idea on how to interact with the new object. Even without never seeing it in action.

Elon Musk claimed for years that his Autopilot system wouldn’t need LIDAR; from an engineering standpoint, this seems both risky and surprising, given the limitations on current machine-vision systems. (Most major competitors do use it.) - p181

Probably because we think that the need for any extra sensor is a “failure” of our AI. See the note above about blank state obsessions.

We have argued that AI is, by and large, on the wrong path, with the majority of current efforts devoted to building comparatively unintelligent machines that perform narrow tasks and rely primarily on big data rather than on what we call deep understanding. We think that is a huge mistake, for it leads to a kind of AI adolescence: machines that don’t know their own strength, and don’t have the wherewithal to contemplate the consequences of their own actions. - p199

This is the conclusion. I will argue that I do not think we are on the wrong path, the real problem is that we are ignoring all the other paths. We are following the open road with a lot of fruits on the side. But we do not know if this road will reach the destination. We should at least keep the mind open.

Further Considerations and Conclusion

In the end, I think this is a good book to have first look at the “Deep Learning opposition.” Even if not all the arguments are really strong, the general message is interesting and sound. Even if you are in love with Deep Learning and you believe it will be the future of AI, it is an excellent book to move Deep Learning itself in a direction that can resolve its main drawbacks. After all, you need to know the problem you want to solve.

Let me know if this format works for you, or you prefer a more in-depth and thoughtful discussion on a book. See you next time.

comments powered by Disqus