The biggest lesson that can be drawn from 70 years of research in AI, is that the common methods using calculations ultimately prove to be the most effective — and by a wide margin. The ultimate cause of this is Moore’s law. Or rather, his generalization of continuous, exponential reduction of computational processors. This “bitter lesson,” said Richard Sutton, canadian computer scientist. Hereinafter in the first person.
Why artificial intelligence researchers were stumped for 70 years?
Most of artificial intelligence research was conducted as if the calculation agent were constant (and in this case, the use of human knowledge would be one of the only ways to improve performance). But after a while — much more than you need for a typical research project will inevitably become far more available computing. In search of improvements that can help in the short term, scientists are trying to use the maximum of human knowledge in this area, but the only thing that matters in the long run is the growing use of computing. These two aspects should not contradict each other, but in practice go. The time spent on one of them, does not equal time spent on the other. There is a psychological obligation to invest in a particular approach. But an approach based on human knowledge, has a tendency to complicate methods so that they become less suited to take advantage of the common methods using calculations.
Conclusion: it is necessary to immediately discard the attempt to solve the problem of AI’s “head” because it will take time and it will be solved much faster and easier — thanks to the growth of computing capacity
There were many examples where researchers in AI belatedly realized this bitter lesson. It will be instructive to consider some of the most outstanding examples.
In computer chess methods that defeated world champion Kasparov in 1997, was based on massive, thorough search. At that time, to them with anxiety treated most researchers of computer chess, which used methods based on the understanding of the special structure of chess. When more simple, based on the search approach with special hardware and software was much more effective, the researchers pushed off from the human understanding of chess, did not admit defeat. They said, “this time the approach of brute force, can be defeated, but that he will not become a common strategy, and certainly people don’t play chess that way. These scientists wanted methods based on human contribution, won, and very disappointed when this did not happen.
Conclusion: a simple brute force calculation will take its toll sooner or later
A similar pattern of progress in research were seen in the computer, only with a delay for another 20 years. Initially, great efforts were directed to avoid using human knowledge or features of the game, but all these efforts proved useless, or even worse, as soon as search used effectively and on a large scale. It was also important to use a training in the process of self-games to learn a value function (as in many other games and even in chess, the only training did not play a big role in the program of 1997, which first defeated the champion of the world). Learning to play with yourself, learning in General, it’s like a search that allows you to apply massive amounts of computing. Recruiting and training are two of the most important class of techniques that involve huge amounts of calculations in the research of AI. In computer go, as in computer chess, the initial research efforts were focused on the use of human understanding (to use less), and only much later was achieved much greater success through the use of search and learning.
Conclusion: recruiting and training a bite of computing power, far superior to attempts to solve the problem of “non-standard approach of thinking”
In the field of speech recognition in the 1970-ies was carried out a contest sponsored by DARPA. The participants represented various methods that used the benefits of human knowledge — knowledge of words or phonemes, the human vocal tract and so on. On the other side of the fence was newer methods, statistical in nature and performs more calculations, based on hidden Markov models (HMM). And again, statistical methods win methods based on human knowledge. This has led to major changes in the whole natural language processing, gradually emerging for decades, yet in the end, statistics and computing began to dominate the field. The recent rise of deep learning in speech recognition is the last step in this consistent direction. Methods deep learning is even less rely on human knowledge and use more computing, along with training on a huge set of samples, and give a stunning speech recognition system.
Richard Sutton, canadian computer scientist
As in games, scientists have always tried to create a system that will work the way they imagined in their heads — they tried to put this knowledge into their systems — but it all came out very unproductive, scientists just wasting time until — because of Moore’s law — became increasingly available massive compute and find of itself fine application.
Conclusion: the same error was repeated for decades
A similar picture was and in the field of computer vision. The first methods was seen as a search of some of the contours of generalized cylinders, or by using SIFT (scale-invariant transformation characteristics). But it’s all thrown into the furnace. Modern neural networks deep learning only use the concept of convolution and to certain invariants and work much better.
This is a great lesson.
Whatever the area we looked, we all continue to make the same mistakes. To see this and to fight effectively, you need to understand why these errors are so attractive, We must learn the bitter lesson that the construction of how we think, starting from how we think doesn’t work in the long term. A bitter lesson, based on historical observations, shows that: 1) researchers in AI often tried to embed the knowledge into their agents; 2) it’s always helped in the short term and brought it to scientists satisfaction; 3) but in the long run, everything came to a standstill and hindered further progress; 4) progress is inevitable breakthrough came with the use of an opposite approach, based on scale computing by a search and learning. The success was bitter taste and is often not absorbed completely, because the success of the computation, and not the success of human-oriented approaches.
From this bitter lesson you should learn one thing: the immense power methods General purpose methods, which continue to scale with the growth of the operation even when the available computing becomes very large. Two methods that seem to arbitrarily massturbate so is the search and training.
The second thing to be learned from this bitter lesson is that the actual content of the mind is extremely and unnecessarily complex; we should stop trying to find simple ways to reflect on the content of the mind, like the simple ways of thinking about space, objects, multiple agents or symmetries. They are all part of arbitrarily complex external world. We should not try to push because of their infinite complexity; we should build on the meta-methods that can find and catch this in arbitrary complexity. These methods can find good approximations, but the search for them should be our way, not us. We need agents AI that can open as well as we, and not contain what we discovered. Building on our discoveries only complicates the process of discovery and search.
Conclusion: you have to trust the calculation rather than try to trace the human thinking and attempts to explain the complex methods of discovery and the search for simple schemes; long-term work first, not last.