We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based model pre-trained to predict the next token in a document. The post-training alignment process results in improved performance on measures of factuality and adherence to desired behavior. A core component of this project was developing infrastructure and optimization methods that behave predictably across a wide range of scales. This allowed us to accurately predict some aspects of GPT-4's performance based on models trained with no more than 1/1,000th the compute of GPT-4.

我们报告了GPT-4的开发,这是一种大规模的多模式模型,可以接受图像和文本输入并产生文本输出。虽然在许多现实世界的场景中,GPT-4的能力不如人类,但它在各种专业和学术基准上表现出了人类水平的表现,包括通过模拟律师考试,成绩在前10%左右。GPT-4是一个基于Transformer的模型,经过预训练可以预测文档中的下一个令牌。训练后的调整过程提高了在真实性和对期望行为的遵守方面的表现。该项目的一个核心组成部分是开发基础设施和优化方法,这些方法在各种规模上都可以预测。这使我们能够根据用不超过GPT-4计算量的1/1000训练的模型准确预测GPT-4性能的某些方面。

原文地址:https://arxiv.org/pdf/2303.08774.pdf