Am I right in suspecting that GPT-4 is not nearly as great an advance on GPT-3 as GPT-3 was on GPT-2? It seems a much better product, but that product seems to have as its selling point not vastly improved text-prediction, but multi-modality.
No one outside of OpenAI really knows how much of an advance GPT-4 is, or isnât.
When GPT-3 came out, OpenAI was still a research company, like DeepMind.
Before there was a GPT-3 product, there was a GPT-3 paper. And it was a long, serious, academic-style paper. It described, in a lot of detail, how they created and evaluated the model.
The paper was an act of scientific communication. A report on a new experiment written for a research audience, intended primarily to transmit information to that audience. It wanted to show you what they had done, so you could understand it, even if you werenât there at the time. And it wanted to convince you of various claims about the modelâs properties.
I donât know if they submitted it to any conferences or journals (IIRC I think they did, but only later on?). But if they did, they could have, and it wouldnât seem out of place in those venues.
Now, OpenAI is fully a product company.
As far as I know, they have entirely stopped releasing academic-style papers. The last major one was the DALLE-2 one, I think. (ChatGPT didnât get one.)
What OpenAI does now is make products. The release yesterday was a product release, not a scientific announcement.
In some cases, as with GPT-4, they may accompany their product releases with things that look superficially like scientific papers.
But the GPT-4 âtechnical reportâ is not a serious scientific paper. A cynic might categorize it as âadvertising.â
More charitably, perhaps itâs an honest attempt to communicate as much as possible to the world about their new model, given a new set of internally defined constraints motivated by business and/or AI safety concerns. But if so, those constraints mean they canât really say much at all â not in a way that meets the ordinary standards of evidence for scientific work.
Their report says, right at the start, that it will contain no information about what the model actually is, besides the stuff that would already be obvious:
GPT-4 is a Transformer-style model [33 ] pre-trained to predict the next token in a document, using both publicly available data (such as internet data) and data licensed from third-party providers. [note that this really only says âwe trained on some data, not all of which was publicâ -nost] The model was then fine-tuned using Reinforcement Learning from Human Feedback (RLHF) [34 ]. Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar.
As Eleutherâs Eric Hallahan put it yesterday:
If we read further into the report, we find a number of impressive-looking evaluations.
But they are mostly novel ones, not done before on earlier LMs. The methodology is presented in a spotty and casual manner, clearly not interested in promoting independent reproductions (and possibly even with the intent of discouraging them).
Even the little information that is available in the report is enough to cast serious doubt on the overall trustworthiness of that information. Some of it violates simple common sense:
âŚand, to the careful independent eye, immediately suggests some very worrying possibilities:
That said â soon enough, we will be able to interact with this model via an API.
And once that happens, Iâm sure independent researchers committed to open source and open information will step in and assess GPT-4 seriously and scientifically â filling the gap left by OpenAIâs increasingly âproduct-yâ communication style.
Just as theyâve done before. The open source / open information community in this area is very capable, very thoughtful, and very fast. (Theyâre where Stable Diffusion came from, to pick just one well-known example.)
â-
When the GPT-3 paper came out, I wrote a post titled âgpt-3: a disappointing paper.â I stand by the title, in the specific sense that I meant it, but I was well aware that I was taking a contrarian, almost trollish pose. Most people found the GPT-3 paper far from âdisappointing,â and I understand why.
But âGPT-4: a disappointing paperâ isnât a contrarian pose. It was â as far as I can see â the immediate and overwhelming consensus of the ML community.
â-
As for the multimodal stuff, uh, time will tell? We canât use it yet, so itâs hard to know how good it is.
What they showed off in the live demo felt a lot like what @nostalgebraist-autoresponder has been able to do for years now.
Like, yeah, GPT-4 is better at it, but itâs not a fundamentally new advance, itâs been possible for a while. And people have done versions of it, eg Flamingo and PaLI and Magma [which Frank uses a version of internally] and CoCa [which Iâm planning to use in Frank, once I get a chance to re-tune everything for it].
I do think itâs a potentially transformative capability, specifically because it will let the model natively âseeâ a much larger fraction of the available information on web pages, and thus enable âaction transformerâ applications a la what Adept is doing.
But again, only time will tell whether these applications are really going to work, and for what, and whether GPT-4 is good enough for that purpose â and whether you even need it, when other text/image language models are already out there and are being rapidly developed.




