O1 Models: OpenAI's Game-Changing Reasoning AI

jeredb
Sep 16, 2024
3 min read

OpenAI has just dropped their new O1 models, and it's not just another incremental update – we're talking about a paradigm shift in how AI works. This might as well be GPT-5, the model we've been hearing whispers about for months (remember the Strawberry or Q* rumors?).

What Makes O1 Special?

The key feature of O1 is its ability to reason. Until now, Large Language Models (LLMs) like ChatGPT or Claude would spit out an answer immediately after our prompt, based on next-word prediction – like your phone's text prediction, but on steroids. Sometimes the answer was right, but sometimes it was also pretty wrong.

That's not how humans approach problems. We don't (or shouldn't) blurt out the first thing that comes to mind. We take time to reason, reflect on past experiences, and formulate an answer. That's exactly what O1 does. It takes a beat, reflects on its answer, can doubt itself, change its mind, realign, and then come up with a much better response.

O1 vs. ChatGPT4o: A Real-World Comparison

To test the difference between ChatGPT4o and O1-preview, I asked both a classic consulting question: "How many golf balls can you fit into a punch buggy?"

ChatGPT4o gave a quick, short response, estimating between 30,000 and 40,000 golf balls.

O1-preview, on the other hand, provided a much longer, more detailed response. It showed its work, including calculations and reasoning. The final answer was approximately 43,000 golf balls (more precisely, 43,776, which it oddly rounded down).

While O1's response wasn't perfect (the rounding was questionable), it demonstrated a more thorough thought process and arrived at a more precise answer.

Current Limitations and Considerations

OpenAI emphasizes that while O1 is exponentially better than GPT4o on many tasks, there are still plenty of things that GPT4o is better at. For example:

O1 doesn't yet have access to the internet
It can't use tools
It's not multimodal
It can't read documents yet

O1 excels at math, reasoning, logic, and coding. For simpler tasks like text generation and editing, GPT4o or 4o-mini might still be your go-to. It's about choosing the right tool for the right task.

Access and Pricing

Currently, O1 is only available to ChatGPT Plus subscribers or developers using the API at tier 5 (spending about $1,000 a month). Even then, users are limited to about 30 messages a week.

The API rates for O1 are significantly higher than for GPT4o, reflecting the increased computing power required for its reasoning capabilities.

The Context Window Challenge

O1 has the same 128k token context window as GPT4o, but because of how it works, we'll hit that limit much faster. To put this into perspective, 128k tokens is roughly equivalent to the length of J.R.R. Tolkien's "The Hobbit" - that's a lot of text! However, the way these tokens are used has changed significantly with O1.

Here's why:

Tokens are how we measure our conversations with LLMs. Each token is about 3/4 of a word. Previously, the context window was split between input (your question) and output (the AI's answer).

Up until now:

Context Window = Input Tokens + Output Tokens

With O1, we have three components:

Your question (input tokens)
The AI's reasoning (invisible tokens)
The AI's answer (output tokens)

Now:

Context Window = Input Tokens + Reasoning Tokens + Output Tokens

Those invisible reasoning tokens eat into our context window. It's like we're paying the AI to do its homework before giving us the result. This means we'll hit the 128k limit faster than with previous models.

The Trade-off

There's an obvious trade-off here. We pay more and reach the context window faster, but if the model is reasoning well, there's a high chance we'll get a good response on the first try. This could reduce the need for back-and-forth to fine-tune what we're looking for.

Looking Ahead

O1 is a huge milestone, bringing us closer to AGI (Artificial General Intelligence). While it has limitations and is more expensive, it represents a significant leap in AI capabilities.

As we move forward, we can expect:

More tools and capabilities to be added to O1
Competition (like Anthropic's Claude) to develop similar reasoning models
Prices to potentially decrease as the technology matures

For now, GPT4o and 4o-mini will do the job for most tasks, but for complex reasoning, math, and coding, O1 is setting a new standard.

The world of AI is evolving rapidly, and O1 is just the beginning of a new era of more thoughtful, reasoning AI. It's an exciting time to be in this field, and I can't wait to see how these developments shape the future of AI and our interactions with it.

What are your thoughts on O1? Have you had a chance to try it?