OpenAI o1: The AI That Can Fact-Check Itself – A Game-Changer or Overpriced Hype?

OPENAI INTRO WEBPAGE

The company behind the revolutionary ChatGPT, OpenAI, has announced its latest AI model: OpenAI o1 . Officially code-named “Strawberry,” the two versions of the o1 model family that have everybody abuzz are o1-preview and o1-mini. These generative AI models promise better reasoning, improved coding skills, and even an ability to fact-check their own outputs. Still, while OpenAI is breaking new ground, some eyebrows are raised by the high costs involved and a number of limitations.

If you are a subscriber of ChatGPT Plus or Team, you already can try o1 through your ChatGPT client. This availability will expand to educational and enterprise users in the coming days. But the question is: Is this update worth all the hype and more importantly, is it worth the astronomically high price?

A First Look at OpenAI o1

Unlike its predecessor, GPT-4o, OpenAI’s o1 doesn’t browse the web or analyze files just yet. While it has impressive image analysis capabilities, they’ve been temporarily switched off while OpenAI conducts some more testing of the model. But o1 really flexes its muscles on code generation and more complex reasoning tasks.

The innovation that makes OpenAI o1 is that it can “think” more holistically before giving an answer, therefore making it more effective in tasks with multiple subtasks or even deep analysis. That is, it may require a little extra processing time in asking a question to give a more accurate and well-rounded response. This “chain of reasoning” enables o1 to self-correct by fact-checking and provides a model not prone to common pitfalls of earlier reasoning models.

But before you get too excited, o1 comes with some limitations. The chatbot is currently rate-limited, with weekly message caps of 30 for o1-preview and 50 for o1-mini. Users have also noted that this model is pricey-very expensive. Pricing starts at $15 per 1 million input tokens and goes to the extreme at $60 per 1 million output tokens, way more pricey than GPT-4o.

You can find more details on the pricing plans and features of OpenAI at their official website.

How Does o1 “Think”?

One of the outstanding features of o1 is that this model is able to self-fact-check by digging deeper into every portion of a query. OpenAI has trained this model in reinforcement learning in that it provides this model with positive rewards when answers are correct and negative when wrong. With o1, this ability to spend more time digging into the details of a question allows it to then make more accurate conclusions, which will fit well with applications like identification of sensitive emails coming from an inbox of a legal mailbox or even developing sophisticated marketing campaigns.

This model has now been optimized with a new dataset that involves scientific literature tailored for reasonings, Noam Brown, a research scientist at OpenAI, said. “The longer thinks, the better it does,” Brown said via a recent post on X.

Still, don’t expect immediate responses to your most pressing questions. According to Pablo Arredondo, VP at Thomson Reuters, it may take more than 10 seconds for o1 to answer complex questions, which can be quite frustrating for the users who expect quicker responses.

How o1 Performs in Real-World Tests

While TechCrunch didn’t get the chance to test o1 before its official debut, a few experts did. Pablo Arredondo, in his analysis of legal documents and handling complex logical tasks, noticed that o1 did much better compared to other older models like GPT-4o. For example, during the qualifying competitions for the IMO, o1 managed to solve 83% of the problems, while GPT-4o could do only 13%. While this may sound impressive, one recent AI from Google DeepMind did even better in such a competition, earning a silver medal.

That said, o1 remains far from perfection. Ethan Mollick, a Wharton professor of management, threw in a hard crossword puzzle at o1. So while the model got the answers right, it does tend to hallucinate and invent a new clue out of nowhere – proving even the best AI sometimes screws things up.

Curious about the potential of AI in education? Read more on AI in classrooms.

The Downsides: Cost and Speed

While much of o1’s reasoning skills are leaps forward, some users might find this model prohibitively expensive and sometimes slow. Pricing is steep-three times more than that of GPT-4o for input tokens and four times more for output tokens. This might become a big drawback for companies that want to increase their use of AI without breaking the bank.

OpenAI also concedes that o1 is susceptible to periodic hallucinations — those errors where the AI overconfidently manufactures its facts. In contrast, o1 performed much better in reasoning exercises, but its tendency to trip on simple games like tic-tac-toe indicates it has ample room for improvement.

Watch exactly how GitHub’s AI Copilot uses OpenAI models to help coders write code.

The Fierce Competition in AI

OpenAI is not alone in pushing the frontier on AI reasoning. Similar models have been in development at Google DeepMind, giving AIs more time and resources to process queries of complexity. Recently, DeepMind researchers showed that giving more compute time to their models significantly improves performance with no additional algorithm tweaks.

Interestingly, OpenAI has been a little cagey about how o1 actually works. The company chose not to publish the model’s full “chains of thought,” apparently for competitive reasons. Instead, users only see “model-generated summaries” of such chains, which begs many questions about how o1 stacks up against the next models out of competitors like Google.

Read about Google’s latest work on AI reasoning-and-fact-checking here.

What’s Next for OpenAI o1?

OpenAI says that o1-mini will be available in the long run for all free users of ChatGPT, although no date has been set. This will mean that, in the future, even more developed reasoning capabilities will be possible, maybe through models which can reason for hours, days, or even weeks.

When competition in the AI space increases, the only real task at hand for OpenAI will be to make o1 more accessible and affordable. Till then, it is a question of weighing the pros and cons by businesses and developers about investments in this path-breaking yet expensive technology.

All in all, the OpenAI o1 model represents a huge leap ahead in AI reasoning and fact-checking. Unfortunately, this huge leap forward may be costly and, from time to time, limited with some outputs. And as always, AI’s future is in evolution, and o1 has just opened the door.

Stay Updated: Artificial Intelligence 

Leave a Comment

Your email address will not be published. Required fields are marked *