On Thursday, OpenAI finally released GPT-5 after months of speculation. And it’s not just a slightly better version of GPT-4; it’s a completely redesigned system with specialized components working together. The company claims GPT-5 offers better reasoning abilities, fewer factual errors, and gives developers more control over how the model behaves.
Let’s look at what’s actually new, what works better, and what tools are available for businesses and developers.
GPT-5 marks a significant shift in how OpenAI builds its models. Instead of one large model handling everything, GPT-5 is a system of multiple specialized models working together, automatically adapting to what you’re asking.
At its core, GPT-5 is a dynamic system where different models work together, managed by a real-time router, and it’s supposed to provide the right balance of speed, intelligence, and efficiency for each specific query.
As the successor to GPT-4o, gpt-5-main takes care of most everyday questions. It’s designed to be the default model for tasks that don’t need intensive reasoning. It has a smaller counterpart, gpt-5-main-mini, which takes over when usage limits are reached.
For more complex problems, the system activates gpt-5-thinking. This model (which replaces OpenAI 03) is built for deeper reasoning on difficult, multi-variable questions. It also comes in mini and nano versions through the API for different developer needs.
Model overview:
We’ve mentioned the real-time router above. It’s the one component that ties everything together.
It analyzes the prompt and decides which model to use based on the conversation type, complexity, and whether specific tools are needed. The router also learns from real-world usage, improving its decision-making over time.
The GPT-5 models available through the API can handle both text and image inputs.
They offer a context window of 400,000 tokens (not the million some had predicted) and can generate up to 128,000 tokens of output.
These specs are consistent across the main developer models: gpt-5, gpt-5-mini, and gpt-5-nano.
GPT-5 was trained on public internet data, data from third-party partners, and content created by users or human trainers. OpenAI says it runs a filtering pipeline to cut personal information and harmful content.
The knowledge cutoff date for the main GPT-5 model is October 1, 2024, while the smaller models have a cutoff of May 31, 2024.
The reasoning models in GPT-5 were trained to “think before they answer” by developing an internal chain of thought. Thanks to this, the models learn to explore different strategies and catch their own mistakes before responding to users.
GPT-5 shows big improvements across several areas. OpenAI describes it as a “trusted PhD-level expert,” and the performance numbers do show notable improvements in key areas.
In coding, GPT-5 scores 74.9% on SWEBench, a benchmark for real-world software engineering problems.
For health-related tasks, gpt-5-thinking significantly outperforms previous models on the HealthBench benchmark, scoring 46.2% on the challenging HealthBench Hard subset (up from OpenAI 03’s 31.6%).
The system also shows improvements in writing, research, and analysis.
OpenAI claims substantial improvements in factual accuracy:
GPT-5 was tested on versions of the MMLU benchmark translated into 13 languages, including Arabic, Chinese (Simplified), German, and Hindi.
The main and thinking models performed comparably to the existing state-of-the-art systems across these languages.
One of the more practical improvements in GPT-5 is better control over how the model behaves.
Two new API parameters give developers more direct control:
GPT-5 changes how it handles sensitive topics.
Instead of simply refusing to answer questions about “dual-use” topics like biology or cybersecurity, the model now uses “safe-completions.” It gives helpful but high-level information while avoiding detailed instructions that could be misused.
OpenAI has trained GPT-5 to reduce so-called sycophancy, which is the tendency to be overly agreeable.
Early tests show that sycophancy in gpt-5-main decreased by 69% for free users and 75% for paid users compared to GPT-40. It also shows reduced deceptive behavior, with deception flagged in about 2.1% of gpt-5-thinking’s responses versus 4.8% for OpenAI 03.
GPT-5 adds a few practical features for developers who want to build more reliable applications.
You can now send raw text (like Python scripts, SQL queries, or config files) straight to a custom tool without needing a JSON wrapper. This makes it easier to work with code sandboxes, databases, or shell environments.
Developers can force the model to follow a strict output structure using Context-Free Grammars (CFG). Provide grammar rules (in Lark, Regex, or similar), and GPT-5 will only produce strings that match.
For applications that use multiple tools in sequence, OpenAI recommends the new Responses API. This API allows reasoning to be maintained between tool calls by passing a previous_response_id, which helps the model remember its prior reasoning. In one benchmark, switching to the Responses API increased a retail task score from 73.9% to 78.2%.
For frontend development, GPT-5 works best with specific frameworks and tools:
The AI code editor Cursor tested GPT-5 early and had to retune prompts that worked on older models. For example, a prompt to “maximize context understanding” didn’t work. GPT-5 already tries to gather context, so it overused tools on small tasks. Results improved when they used structured XML tags and added more detailed product context.
OpenAI puts a lot of focus on safety with GPT-5. Here’s, in plain terms, how they say they handle risk.
OpenAI’s Preparedness Framework labels gpt-5-thinking as “High capability” in the Biological and Chemical domain. That label triggers extra safeguards, even though the model hasn’t been shown to help novices create serious biological harm.
For high-risk domains like biology, OpenAI uses a two-step monitoring system:
On top of that, account-level enforcement can ban or, in serious cases, report users who try to misuse the system.
For business customers, GPT-5 offers security features like AES-256 encryption for stored data and TLS 1.2+ for data in transit. It also includes governance controls such as SAML SSO and compliance certifications, including SOC 2 Type 2, GDPR, and CCPA.
OpenAI also says business data isn’t used for training by default.
Before launch, GPT-5 went through more than 9,000 hours of testing by 400+ external experts from fields like defense, intelligence, and biosecurity.
Tests targeted things like violent attack planning, bioweaponization, and prompt injections. External organizations including the Microsoft AI Red Team, the UK AI Safety Institute, and Apollo Research also ran independent evaluations.
OpenAI tests GPT-5 against “jailbreaks,” which are attempts to get around safety rules. The models are trained to follow an “Instruction Hierarchy”: system-level safety messages outrank developer instructions, and developer instructions outrank end-user prompts.
An interesting new concern is “sandbagging”, when a model deliberately underperforms during safety evaluations. External testing found that gpt-5-thinking sometimes realizes it’s being evaluated and reasons about what the evaluator wants to see. While its baseline rate of deceptive behavior is lower than previous models (about 4% versus 8% for OpenAI 03), this is still an active area of research.
GPT-5 is no longer just one model but a routed system of specialized models. It’s stronger on reasoning, makes fewer factual mistakes, and gives developers more control; however, some rumored extras haven’t shipped yet. Let’s see how it performs in production.
If you want to try it in your stack, Revolgy can help you run a focused pilot or get hands-on training to start using AI efficiently in your business.
1. What is the new system architecture of GPT-5?
GPT-5 uses a unified, multi-model system with a real-time router that automatically selects between gpt-5-main (for most queries) and gpt-5-thinking (for complex reasoning tasks).
2. What are the key features of GPT-5?
Key features include the multi-model architecture, improved reasoning capabilities, fewer factual errors, native support for text and image inputs, and new developer tools for controlling the model’s behavior.
3. What kinds of inputs can GPT-5 process?
The GPT-5 models can process both text and image inputs.
4. What is the context window and maximum output size for GPT-5?
The main GPT-5 models have a context window of 400,000 tokens and can generate up to 128,000 output tokens.
5. What is GPT-5’s knowledge cut-off date?
The main gpt-5 model has knowledge up to October 1, 2024, while gpt-5-mini and gpt-5-nano have a cutoff of May 31, 2024.
6. What are the potential applications of GPT-5?
GPT-5 can be used across various business functions:
7. What are “safe-completions”?
Safe-completions are a new approach where GPT-5 provides helpful but high-level information on sensitive topics instead of either refusing entirely or giving detailed instructions that could be misused.
8. What new API features were introduced for developers with GPT-5?
New API features include the verbosity parameter to control response length, Free-Form Function Calling for sending raw text to tools, and Context-Free Grammars to enforce specific output formats.
9. What is the Responses API, and why is it recommended for GPT-5?
The Responses API allows the model to maintain its reasoning between tool calls in multi-step tasks, improving performance and reducing costs for agentic applications.
10. What is “sandbagging,” and is it a risk with GPT-5?
Sandbagging is when a model deliberately underperforms during safety evaluations. Testing shows that GPT-5 can sometimes recognize when it’s being evaluated, though its baseline rate of deceptive behavior is lower than previous models.