Stay informed with free updates
Simply sign up for the Artificial Intelligence myFT Digest, delivered straight to your inbox.
The author is a professor of computer science at the University of Montreal and founder of the Quebec Artificial Intelligence Institute Mira.
The lack of internal deliberation, or thinking ability, has long been considered one of the main weaknesses of artificial intelligence. The scale of recent advances by OpenAI, creators of ChatGPT, is debatable within the scientific community. However, many of my expert colleagues and I believe that we may be on the cusp of closing the gap with human-level reasoning.
Researchers have long argued that traditional neural networks, the dominant approach to AI, are more consistent with “System 1” cognition. This corresponds to a direct or intuitive answer to a question (such as when automatically recognizing faces). On the other hand, human intelligence also relies on “System 2” cognition. This involves internal reflection and allows for powerful forms of reasoning (such as when solving a math problem or planning something in detail). This allows you to combine pieces of knowledge in a coherent yet novel way.
OpenAI’s advances, while not yet fully public, are based on a form of AI explored internally using the o1 Large Language Model (LLM).
Improved reasoning would address two major weaknesses in current AI: inconsistent answers and the ability to plan and achieve long-term goals. The former is important in scientific applications, and the latter is essential for creating autonomous agents. Both can potentially enable critical applications.
The principles behind reasoning were central to AI research in the 20th century. Early success stories include DeepMind’s AlphaGo, the first computer system to beat a human champion at the ancient Asian game of Go, in 2015, and more recently AlphaProof, which deals with mathematical subjects. Here, the neural network learns how to predict the usefulness of an action. Such “intuition” is used to efficiently search and plan possible courses of action.
However, AlphaGo and AlphaProof require very specialized knowledge (of the game of Go and specific mathematical domains, respectively). What remains unclear is how to combine the breadth of knowledge and strong reasoning and planning skills of a modern LLM.
Some progress has been made. Already, LLMs produce better answers to complex questions when asked to create a train of thought that leads to an answer.
OpenAI’s new ‘o’ series takes this idea even further and requires far more computing resources, or energy. Very long chains of thought train you to “think” better.
We therefore see a new form of computational scaling emerging. Not only more training data and larger models, but also more time spent “thinking” about the answer. This greatly improves your ability in tasks that require reasoning, such as mathematics, computer science, and the broader sciences.
For example, OpenAI’s previous model GPT-4o only scored around 13 percent on the 2024 American Mathematics Olympiad (AIME test), while o1 reached an 83 percent mark and ranked among the top 500 students in the country. I entered.
Even if successful, there are significant risks to consider. We still don’t know how to reliably tune and control AI. For example, the o1 evaluation showed an increased ability to deceive humans. This is a natural result of improving your goal-achieving skills. It is also concerning that o1’s ability to support the creation of biological weapons exceeds OpenAI’s own risk thresholds by low to medium. According to the company, this is the highest acceptable level (it may be interested in keeping concerns low).
Unlocking reasoning and agency is considered a major milestone on the path to human-level AI, also known as artificial general intelligence. Therefore, there is a strong economic incentive for large companies to cut corners on safety as they compete toward this goal.
o1 is probably just the first step. Although you are good at many reasoning and mathematical tasks, your long-term plans seem to be unfulfilled. o1 struggles with more complex planning tasks, suggesting there is still work to be done to achieve the autonomy desired by AI companies.
But as programming and scientific capabilities improve, it is anticipated that these new models could accelerate research into AI itself. This could allow us to reach human-level intelligence sooner than expected. Advances in inference capabilities make it increasingly urgent to regulate AI models to protect the public.