337p人体粉嫩胞高清图片,97人妻精品一区二区三区在线 ,日本少妇自慰免费完整版,99精品国产福久久久久久,久久精品国产亚洲av热一区,国产aaaaaa一级毛片,国产99久久九九精品无码,久久精品国产亚洲AV成人公司
網(wǎng)易首頁 > 網(wǎng)易號 > 正文 申請入駐

重溫神作《苦澀的教訓(xùn)》:預(yù)判了從 GPT 到 o1/r1 到 Manus,以及更多...

0
分享至


這是 AI 從業(yè)者的必讀指南

周末,讓我們重溫下《苦澀的教訓(xùn)》這一神作,發(fā)布于 2019 年,預(yù)言全中,作者 Rich Sutton,是現(xiàn)代強(qiáng)化學(xué)習(xí)之父

Rich Sutton 寫下《苦澀的教訓(xùn)》,核心觀點只有一句:搜索和學(xué)習(xí)這兩種通用方法,配合算力擴(kuò)展,最終會碾壓一切精巧的人工設(shè)計

當(dāng)時主流觀點還是「純堆算力不行,要嵌入人類知識」。然后 GPT-3 來了,Scaling Laws 被驗證了,語言學(xué)家設(shè)計了幾十年的 NLP 流水線被一個 Transformer 端到端取代,ChatGPT 爆發(fā)。預(yù)言全部兌現(xiàn)

這個預(yù)測,現(xiàn)在也正在 Agent 領(lǐng)域繼續(xù)驗證

推理模型把搜索內(nèi)化到模型內(nèi)部,o1、DeepSeek-R1 不需要外部設(shè)計思維鏈,模型自己在 token 空間里搜索推理路徑

Manus 這類 Agent 更進(jìn)一步(他們在定方向的時候,復(fù)用了 Sutton 的結(jié)論:交給模型):模型自己判斷用什么工具、怎么拆解任務(wù)、如何執(zhí)行。不再需要人工編排 workflow

這和 Sutton 六年前的判斷完全一致:別折騰精巧設(shè)計了,通用方法配合算力擴(kuò)展,最終會贏


苦澀的教訓(xùn)(譯)

AI 研究 70 年,最大的教訓(xùn)只有一個:利用算力的通用方法,最終總是最有效的,而且優(yōu)勢極其明顯

根本原因在于摩爾定律,或者更準(zhǔn)確地說,在于單位算力成本持續(xù)指數(shù)級下降這一更普遍的規(guī)律。大多數(shù) AI 研究都有一個隱含假設(shè):智能體可用的算力是固定的。在這個假設(shè)下,嵌入人類知識幾乎是提升性能的唯一途徑。但只要把時間尺度稍微拉長,超出一個典型研究項目的周期,算力就會出現(xiàn)數(shù)量級的增長

為了在短期內(nèi)做出成果,研究者傾向于利用自己對領(lǐng)域的理解。但長期來看,真正重要的只有一件事:如何利用算力。這兩條路線理論上可以并行,實踐中卻往往相互排斥。時間花在一邊,就沒法花在另一邊。心理上也會形成路徑依賴。更麻煩的是,人類知識導(dǎo)向的方法往往把系統(tǒng)搞得很復(fù)雜,反而不利于發(fā)揮通用方法的算力優(yōu)勢。AI 研究者一次又一次地遲到才學(xué)會這個苦澀的教訓(xùn),回顧幾個最典型的案例很有啟發(fā)

在國際象棋領(lǐng)域,1997 年擊敗卡斯帕羅夫的方法,核心就是大規(guī)模深度搜索。當(dāng)時大多數(shù)計算機(jī)象棋研究者對此很不滿。他們一直在研究如何利用人類對棋局結(jié)構(gòu)的理解。當(dāng)一個更簡單的、基于搜索的方法配合專用硬件和軟件被證明遠(yuǎn)遠(yuǎn)更有效時,這些研究者輸?shù)貌⒉惑w面。他們說「暴力搜索」這次贏了,但這不是通用策略,而且人類下棋也不是這么下的。他們希望基于人類知識的方法獲勝,結(jié)果失望了

圍棋領(lǐng)域上演了同樣的劇情,只是晚了 20 年。早期投入了大量精力來避免搜索,想辦法利用人類知識,利用圍棋的特殊結(jié)構(gòu)。但當(dāng)搜索被有效地大規(guī)模應(yīng)用后,所有這些努力都變得無關(guān)緊要,甚至適得其反。同樣重要的是通過自我對弈,來學(xué)習(xí)價值函數(shù)(即:讓 AI 自己跟自己下棋,學(xué)習(xí)判斷局面好壞)

這個方法在很多游戲甚至國際象棋中都很關(guān)鍵,盡管學(xué)習(xí)在 1997 年首次擊敗世界冠軍的程序中并沒有起主要作用。自我對弈學(xué)習(xí),乃至學(xué)習(xí)本身,和搜索一樣,都是讓大規(guī)模算力發(fā)揮作用的方式。搜索和學(xué)習(xí)是 AI 研究中利用海量算力的兩類最重要技術(shù)。在圍棋領(lǐng)域,和國際象棋一樣,研究者最初把精力放在利用人類理解上,希望減少搜索量,很久之后才通過擁抱搜索和學(xué)習(xí)取得了大得多的成功

在語音識別領(lǐng)域,1970 年代 DARPA 資助了一場早期競賽。參賽者中有大量利用人類知識的特殊方法,涉及關(guān)于單詞、音素、人類聲道等等的知識。另一邊是更新的統(tǒng)計方法,計算量更大,基于隱馬爾可夫模型(HMM)。統(tǒng)計方法再次戰(zhàn)勝了人類知識導(dǎo)向的方法。這引發(fā)了整個自然語言處理領(lǐng)域幾十年的漸變,統(tǒng)計和計算開始主導(dǎo)這個領(lǐng)域。近年來深度學(xué)習(xí)在語音識別中的崛起,是這個方向上最新的一步。深度學(xué)習(xí)方法對人類知識的依賴更少,使用更多算力,在海量訓(xùn)練集上學(xué)習(xí),產(chǎn)生了效果好得多的語音識別系統(tǒng)。和游戲領(lǐng)域一樣,研究者總是試圖讓系統(tǒng)按照他們認(rèn)為自己大腦工作的方式來運(yùn)作。他們試圖把這些知識嵌入系統(tǒng)。但最終證明這是適得其反的,是對研究者時間的巨大浪費(fèi)。因為通過摩爾定律,海量算力變得可用,而且找到了利用它的方法

計算機(jī)視覺領(lǐng)域也是同樣的模式。早期方法把視覺理解為搜索邊緣、廣義圓柱體、或者 SIFT 特征。但今天這些全被拋棄了。現(xiàn)代深度學(xué)習(xí)神經(jīng)網(wǎng)絡(luò)只使用卷積和某些不變性的概念,效果卻好得多

這是一個重大教訓(xùn)。作為一個領(lǐng)域,我們?nèi)匀粵]有徹底學(xué)會它,因為我們還在犯同樣的錯誤。要看清這一點并有效抵制它,我們必須理解這些錯誤為什么有吸引力。我們必須學(xué)會這個苦澀的教訓(xùn):把我們自以為的思維方式嵌入系統(tǒng),長期來看行不通

苦澀的教訓(xùn)基于以下歷史觀察:
1)AI 研究者經(jīng)常試圖把知識嵌入智能體
2)這在短期內(nèi)總是有幫助的,而且讓研究者個人很有成就感
3)但長期來看會遇到瓶頸,甚至阻礙進(jìn)一步發(fā)展
4)突破性進(jìn)展最終來自相反的方法,即通過搜索和學(xué)習(xí)來擴(kuò)展算力

最終的成功帶著苦澀,而且往往消化不完全,因為它是對一種受偏愛的、以人類為中心的方法的勝利

從苦澀的教訓(xùn)中應(yīng)該學(xué)到的第一點是:通用方法的力量是巨大的。這些方法能隨著算力增加而持續(xù)擴(kuò)展,即使算力變得非常大也能繼續(xù)擴(kuò)展。能夠這樣無限擴(kuò)展的方法似乎只有兩種:搜索學(xué)習(xí)

第二點是:心智的實際內(nèi)容極其復(fù)雜,而且這種復(fù)雜性無法簡化。我們應(yīng)該停止尋找簡單的方式來思考心智的內(nèi)容,比如關(guān)于空間、物體、多智能體或?qū)ΨQ性的簡單概念。所有這些都是外部世界的一部分,而外部世界是任意的、內(nèi)在復(fù)雜的。我們不應(yīng)該把這些內(nèi)容嵌入系統(tǒng),因為它們的復(fù)雜性是無窮的。我們應(yīng)該只嵌入能夠發(fā)現(xiàn)和捕捉這種任意復(fù)雜性的元方法。這些方法的關(guān)鍵在于:它們能夠找到好的近似解,但尋找的過程應(yīng)該由我們的方法來完成,而不是由我們?nèi)祟愑H自來完成

我們想要的是能夠像我們一樣去發(fā)現(xiàn)的 AI 智能體,而不是包含我們已有發(fā)現(xiàn)的 AI 智能體

把我們的發(fā)現(xiàn)嵌入系統(tǒng),只會讓我們更難看清發(fā)現(xiàn)過程本身是如何運(yùn)作的


The Bitter Lesson

The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin. The ultimate reason for this is Moore's law, or rather its generalization of continued exponentially falling cost per unit of computation. Most AI research has been conducted as if the computation available to the agent were constant (in which case leveraging human knowledge would be one of the only ways to improve performance) but, over a slightly longer time than a typical research project, massively more computation inevitably becomes available. Seeking an improvement that makes a difference in the shorter term, researchers seek to leverage their human knowledge of the domain, but the only thing that matters in the long run is the leveraging of computation. These two need not run counter to each other, but in practice they tend to. Time spent on one is time not spent on the other. There are psychological commitments to investment in one approach or the other. And the human-knowledge approach tends to complicate methods in ways that make them less suited to taking advantage of general methods leveraging computation. There were many examples of AI researchers' belated learning of this bitter lesson, and it is instructive to review some of the most prominent.

In computer chess, the methods that defeated the world champion, Kasparov, in 1997, were based on massive, deep search. At the time, this was looked upon with dismay by the majority of computer-chess researchers who had pursued methods that leveraged human understanding of the special structure of chess. When a simpler, search-based approach with special hardware and software proved vastly more effective, these human-knowledge-based chess researchers were not good losers. They said that "brute force" search may have won this time, but it was not a general strategy, and anyway it was not how people played chess. These researchers wanted methods based on human input to win and were disappointed when they did not.

A similar pattern of research progress was seen in computer Go, only delayed by a further 20 years. Enormous initial efforts went into avoiding search by taking advantage of human knowledge, or of the special features of the game, but all those efforts proved irrelevant, or worse, once search was applied effectively at scale. Also important was the use of learning by self play to learn a value function (as it was in many other games and even in chess, although learning did not play a big role in the 1997 program that first beat a world champion). Learning by self play, and learning in general, is like search in that it enables massive computation to be brought to bear. Search and learning are the two most important classes of techniques for utilizing massive amounts of computation in AI research. In computer Go, as in computer chess, researchers' initial effort was directed towards utilizing human understanding (so that less search was needed) and only much later was much greater success had by embracing search and learning.

In speech recognition, there was an early competition, sponsored by DARPA, in the 1970s. Entrants included a host of special methods that took advantage of human knowledge---knowledge of words, of phonemes, of the human vocal tract, etc. On the other side were newer methods that were more statistical in nature and did much more computation, based on hidden Markov models (HMMs). Again, the statistical methods won out over the human-knowledge-based methods. This led to a major change in all of natural language processing, gradually over decades, where statistics and computation came to dominate the field. The recent rise of deep learning in speech recognition is the most recent step in this consistent direction. Deep learning methods rely even less on human knowledge, and use even more computation, together with learning on huge training sets, to produce dramatically better speech recognition systems. As in the games, researchers always tried to make systems that worked the way the researchers thought their own minds worked---they tried to put that knowledge in their systems---but it proved ultimately counterproductive, and a colossal waste of researcher's time, when, through Moore's law, massive computation became available and a means was found to put it to good use.

In computer vision, there has been a similar pattern. Early methods conceived of vision as searching for edges, or generalized cylinders, or in terms of SIFT features. But today all this is discarded. Modern deep-learning neural networks use only the notions of convolution and certain kinds of invariances, and perform much better.

This is a big lesson. As a field, we still have not thoroughly learned it, as we are continuing to make the same kind of mistakes. To see this, and to effectively resist it, we have to understand the appeal of these mistakes. We have to learn the bitter lesson that building in how we think we think does not work in the long run. The bitter lesson is based on the historical observations that 1) AI researchers have often tried to build knowledge into their agents, 2) this always helps in the short term, and is personally satisfying to the researcher, but 3) in the long run it plateaus and even inhibits further progress, and 4) breakthrough progress eventually arrives by an opposing approach based on scaling computation by search and learning. The eventual success is tinged with bitterness, and often incompletely digested, because it is success over a favored, human-centric approach.

One thing that should be learned from the bitter lesson is the great power of general purpose methods, of methods that continue to scale with increased computation even as the available computation becomes very great. The two methods that seem to scale arbitrarily in this way are search and learning.

The second general point to be learned from the bitter lesson is that the actual contents of minds are tremendously, irredeemably complex; we should stop trying to find simple ways to think about the contents of minds, such as simple ways to think about space, objects, multiple agents, or symmetries. All these are part of the arbitrary, intrinsically-complex, outside world. They are not what should be built in, as their complexity is endless; instead we should build in only the meta-methods that can find and capture this arbitrary complexity. Essential to these methods is that they can find good approximations, but the search for them should be by our methods, not by us. We want AI agents that can discover like we can, not which contain what we have discovered. Building in our discoveries only makes it harder to see how the discovering process can be done.

特別聲明:以上內(nèi)容(如有圖片或視頻亦包括在內(nèi))為自媒體平臺“網(wǎng)易號”用戶上傳并發(fā)布,本平臺僅提供信息存儲服務(wù)。

Notice: The content above (including the pictures and videos if any) is uploaded and posted by a user of NetEase Hao, which is a social media platform and only provides information storage services.

相關(guān)推薦
熱點推薦
阿里巴巴2025年裁員超6.6萬人?

阿里巴巴2025年裁員超6.6萬人?

芯智訊
2026-03-21 11:06:56
清華大學(xué)今年起陸續(xù)出版61冊思政課教學(xué)用書

清華大學(xué)今年起陸續(xù)出版61冊思政課教學(xué)用書

中國青年報
2026-03-19 21:27:08
央媒重磅發(fā)聲!釋永信4大罪名坐實,涉案細(xì)節(jié)公開,坐牢難平眾怒

央媒重磅發(fā)聲!釋永信4大罪名坐實,涉案細(xì)節(jié)公開,坐牢難平眾怒

八斗小先生
2026-03-21 11:11:11
誰敢動中國我就滅誰!兩國曾公開站臺中國,如今卻站在了對立面

誰敢動中國我就滅誰!兩國曾公開站臺中國,如今卻站在了對立面

來科點譜
2026-03-19 10:07:54
重大升級!美以伊終極大戰(zhàn),還是打響了!

重大升級!美以伊終極大戰(zhàn),還是打響了!

大嘴說天下
2026-03-19 18:34:03
老人帶過孫輩才明白:親手帶大,無論孫子還是外孫,都有5個共性

老人帶過孫輩才明白:親手帶大,無論孫子還是外孫,都有5個共性

小影的娛樂
2026-03-21 02:25:54
殲-20總師被除名:長期任軍工央企高管,一年多未露面一細(xì)節(jié)罕見

殲-20總師被除名:長期任軍工央企高管,一年多未露面一細(xì)節(jié)罕見

博士觀察
2026-03-19 08:44:39
60年前,金門戰(zhàn)役里被俘的3000名我軍將士,如今都怎么樣了

60年前,金門戰(zhàn)役里被俘的3000名我軍將士,如今都怎么樣了

老范談史
2026-03-17 09:58:57
果然蛻變了!熱巴回國后第一次正式公開活動,眼神變了,面相變了

果然蛻變了!熱巴回國后第一次正式公開活動,眼神變了,面相變了

怎挽怎挽
2026-03-21 11:38:16
1972年有人要撤汪東興的中辦廳主任,毛主席:替我轉(zhuǎn)告總理三句話

1972年有人要撤汪東興的中辦廳主任,毛主席:替我轉(zhuǎn)告總理三句話

比利
2026-03-21 16:47:39
伊朗的“持久戰(zhàn)”:只要扛過半年,美國在中東將徹底淪為路人?

伊朗的“持久戰(zhàn)”:只要扛過半年,美國在中東將徹底淪為路人?

三石記
2026-03-21 16:06:00
騰訊阿里美團(tuán)湊齊了,宇樹科技363頁招股書里的11個彩蛋

騰訊阿里美團(tuán)湊齊了,宇樹科技363頁招股書里的11個彩蛋

盒飯財經(jīng)
2026-03-21 10:15:21
不容錯過!3月21日晚19:30,央視5套CCTV5、CCTV5+直播時間表

不容錯過!3月21日晚19:30,央視5套CCTV5、CCTV5+直播時間表

皮皮觀天下
2026-03-21 16:49:53
男按摩師揭秘:很多女顧客需要的并不是按摩,更需要的是我!

男按摩師揭秘:很多女顧客需要的并不是按摩,更需要的是我!

千秋歷史
2026-03-18 21:12:02
房子是上個月掛牌的,心是這個月涼透的!150萬買的,現(xiàn)就這價?

房子是上個月掛牌的,心是這個月涼透的!150萬買的,現(xiàn)就這價?

楠楠自語
2026-03-17 18:29:11
如果你年過60歲,父母還在,勸你能花錢請人照顧,就不要親力親為

如果你年過60歲,父母還在,勸你能花錢請人照顧,就不要親力親為

小馬達(dá)情感故事
2026-03-20 15:54:59
12天票房破12億,力壓《鏢人》輕松奪冠,吳京的全球冠軍夢要碎了

12天票房破12億,力壓《鏢人》輕松奪冠,吳京的全球冠軍夢要碎了

影視高原說
2026-03-20 06:57:54
為什么WTO很少被提起了?中國入世談判花了15年,如今幾乎被架空

為什么WTO很少被提起了?中國入世談判花了15年,如今幾乎被架空

混沌錄
2026-03-21 15:31:24
日船剛闖釣魚島,俄軍就逼近日本!高市沒想到,普京對華這么仗義

日船剛闖釣魚島,俄軍就逼近日本!高市沒想到,普京對華這么仗義

墨羽怪談
2026-03-20 14:24:37
正式退役!整整28年啊,終于可以退役了,神射手終于圓夢了

正式退役!整整28年啊,終于可以退役了,神射手終于圓夢了

球童無忌
2026-03-21 00:47:48
2026-03-21 18:40:49
賽博禪心
賽博禪心
拜AI古佛,修賽博禪心
337文章數(shù) 49關(guān)注度
往期回顧 全部

科技要聞

宇樹招股書拆解,人形機(jī)器人出貨量第一!

頭條要聞

伊朗發(fā)射3800公里射程的導(dǎo)彈 最令美軍戰(zhàn)栗的細(xì)節(jié)披露

頭條要聞

伊朗發(fā)射3800公里射程的導(dǎo)彈 最令美軍戰(zhàn)栗的細(xì)節(jié)披露

體育要聞

誰在決定字母哥未來?

娛樂要聞

CMG盛典獲獎名單:章子怡高葉同獲影后

財經(jīng)要聞

通脹警報拉響,加息潮要來了?

汽車要聞

小鵬汽車2025年Q4盈利凈賺3.8億 全年營收767億

態(tài)度原創(chuàng)

親子
數(shù)碼
本地
手機(jī)
公開課

親子要聞

寶藍(lán)的奶奶生病了,寶藍(lán)幫助奶奶收拾房間,清掃地面,收拾廚房~

數(shù)碼要聞

迷你主機(jī)新玩法:雙系統(tǒng)+AI一鍵裝

本地新聞

春色滿城關(guān)不住|紹興春日頂流,這片櫻花海藏不住了

手機(jī)要聞

一代神機(jī)再續(xù)命!華為Mate40系列被曝Q2升級鴻蒙6.0!

公開課

李玫瑾:為什么性格比能力更重要?

無障礙瀏覽 進(jìn)入關(guān)懷版