The real “bitter lesson” of artificial intelligence

The real “bitter lesson” of artificial intelligence

In a popular blog post titled “The Bitter Lesson,” Richard Sutton argues that advances in AI are the result of cheaper computations, not human design decisions based on problem-specific information. Sutton diminishes researchers who integrate knowledge into solutions based on their understanding of a problem to improve performance. This temptation, Sutton explains, is good for short-term performance gains, and such vanity is satisfying to the researcher. However, such human ingenuity comes at the expense of AI’s divine destiny by inhibiting the development of a solution that doesn’t want our help in understanding a problem. The goal of the AI ​​is to recreate the problem solver from scratch, not to solve the problems directly.[1]

In a rebuttal titled “The Best Lesson”, roboticist Rodney Brooks argues that examples of Sutton’s general solutions that ignore human ingenuity in favor of “brute force” computation are [in fact] the result of human ingenuity. Brooks provides examples of human ingenuity, such as network architectures like convolutional neural networks and computational features like translational invariance, effectively arguing that field swaps failed human ingenuity for human ingenuity more successful. Brooks’ ecological argument suggests that having an AI that learns everything “sounds pedantic in the extreme” and adds that learning everything will “increase computational costs…by orders of magnitude”.

In one such example, Sutton writes, “the methods which defeated the world champion, [Garry] Kasparov, in 1997, were based on massive and extensive research. It may be true that deeper research results in better performance, as more chess moves would be evaluated at each stage of a game. However, yields are falling and not everyone is interested in trading future theoretical performance gains for higher electricity bills. Yet the claim that Deep Blue had no human knowledge built into the system is false. The “massive deep search” was explicitly designed for games, primed with opening moves provided by grandmasters, and the rating function was designed by humans for chess and is not known to the search algorithm.

Rather than advocating deeper research, a more interesting question for researchers is how humans can be so competitive in a game where we don’t use research, let alone deep research, to assess millions of moves. per second. Human chess players can analyze ten moves by anticipating advantages and working backwards to find move sequences. Unfortunately, the question of how humans can be so competitive is ignored because so little care on the ground. Sutton makes this point and urges researchers to stop “building our way of thinking,” which he says “doesn’t work in the long run.” This statement is odd since Sutton is one of the founders of modern computational reinforcement learning, which developed from the now defunct theory of mind called behaviorism. Behaviorism reduces mental states to physical states and purges the mind in favor of environmental stimuli and behavioral responses. In other words, reinforcement learning was a response to how we thought we thought. I guess Sutton means it’s okay to incorporate our way of thinking if those theories are outdated.[2]

Like Ray Kurzweil, Sutton attributes all future advances in AI to Moore’s Law. However, the Moore’s Law argument, hence AI, is a category mistake, not to mention anti-intellectual. This is a categorical error because advances in artificial intelligence require more than advances in computation.

This thinking ignores the philosophical, ontological, and technological breakthroughs needed to achieve true artificial intelligence, which will not be achieved through research or brute-force learning. She is anti-intellectual because she purges all research in favor of Moore’s Law. Such a statement would point the finger at Yann LeCun, Geoffrey Hinton and countless other researchers and declare that their work was not revolutionary. Instead, they were saved by the grace of Moore’s Law, which suggests that the next generation sit still and do nothing.

Incorporating problem-specific information into a solution comes at the cost of scale if scale is considered like all other problems.[3] However, most companies only need to solve certain problems, mainly the problems that their customers are facing. There is no general purpose enterprise, so there is no need for a general purpose solution. This is a bitter business lesson, as problem-specific information helps you deliver your solution faster. Moreover, customers care about stable value propositions. Machine learning is probabilistic and has unstable value propositions, and remains comically fragile despite benefiting from extra computational power. Therefore, a more stable system using non-probability elements running in the background or foreground provides a more sustainable solution for customers and more stable value propositions. Moreover, machine learning-only solutions cannot be pinpointed precisely. You can improve training data, collect more data, adjust settings, or replace one solution with another to improve performance. However, if you need a specific solution and don’t want to replace the entire system, you cannot guarantee improvement unless you use non-probability evidence. The catch is that non-probability elements are no more than what can be held in the minds of a few programmers and should be used with caution.[4] The point is, your solution doesn’t need to know or want to solve a problem, but you and your team do.

Sutton’s article is short (1,116 relatively concise words), which makes me wonder why. Why not make a clever self-referential statement to the reader like the one he advocates in the article while waiting to write an article that would generalize to all possible questions? Instead, Sutton simply shares her unique perspective on a specific subject based on her knowledge that took a career to accumulate. Pedagogical frivolity aside, I hope we can accept the truth that science needs scientists and not transfer all the potential to Moore’s Law.[5] It is a mistake to do nothing until you can do everything.[6]


[1] That’s not to say Sutton isn’t a pioneer, but that argument isn’t new. AI pioneer Marvin Minsky shared a similar notion when he recalled his approach to AI in a biography by the new yorker in 1981, saying, “I don’t have to tell the machine exactly what to do.”

[2] AI research is almost exclusively concerned with maintaining computational, albeit false, metaphors of the mind. https://bdtechtalks.com/2022/04/08/ai-brain-in-jar/

[3] In fact, much more: “The Bitter Lesson” advocates less, perhaps no, regard for safety, security, privacy, fairness and transparency, because it has one goal in mind. This type of advocacy has destructive elements.

[4] For more information on hybrid systems: https://www.oodaloop.com/archive/2022/10/25/weak-signals-early-warnings-and-strong-security/

[5] In a particularly striking example of hubris, Meta’s new language model
“Galactica” which only survived three days: https://www.technologyreview.com/2022/11/18/1063487/meta-large-language-model-ai-only-survived-three-days-gpt- 3-science /

[6] I thought I was borrowing from Edmund Burke, but it seems Sydney Smith deserves credit for my inspiration: “No one made a bigger mistake than he who did nothing because he could do little .” https://quoteinvestigator.com/2020/01/31/little/

#real #bitter #lesson #artificial #intelligence

Leave a Comment

Your email address will not be published. Required fields are marked *