Blog

Apollo-1: The Foundation Model for Controllable Agents

Reliable agents require more than intelligence. They require control. Apollo-1 is the first foundation model to unify generation and control, enabling programmable behavior in critical situations while preserving LLM-level fluency.

Pending Release

Authors: Authors: Ohad Elhelo, Ori Cohen, Co-Founders

01. The Control Problem

Two different things are emerging under the name “agent.”

Open-ended agents work for users. Coding assistants, computer-use agents, personal AI. You’re the principal—the one whose goals matter. If the agent interprets your intent slightly differently each time, that’s fine. You’re in the loop. You’ll correct it. Flexibility is the point.

Task-oriented agents work on behalf of entities. An airline’s booking agent. A bank’s support agent. An insurer’s claims agent. These agents serve users, but they represent the entity. The entity is the principal—the one whose policies must be enforced. The agent has to follow the entity’s rules while conversing naturally with customers.

The requirements are fundamentally different.

Open-ended agents need maximum flexibility. They should handle novel situations, figure things out, do whatever the user wants. LLMs are well-suited here—probabilistic, creative, adaptable.

Task-oriented agents need control. Specific behaviors must execute in specific scenarios. “If the refund exceeds $200, always require ID.” “Always offer insurance before payment.” These aren’t suggestions. They’re requirements that determine whether AI can be trusted with customer interactions involving real money, real appointments, and real business logic.

LLMs can’t guarantee these behaviors. Not because they’re not smart enough, because they’re architecturally incapable of it. They predict tokens. They approximate intent. They do what you want most of the time.

Most of the time isn’t good enough when the stakes are real.

Two kinds of agents. Two different foundations required.

02. Why LLMs Agents Can’t Solve the Control Problem

The opportunity is enormous. Every conversation that results in real-world action—booking flights, processing payments, managing claims, executing trades—could be automated. These interactions run the economy. The market for task-oriented agents dwarfs what open-ended assistants will ever capture.

But without control, task-oriented agents can’t be deployed reliably at scale. Enterprises won’t trust AI with customer interactions when “usually works” is the best guarantee available.

The industry has spent three years and billions trying to make LLMs agents work. The approaches vary. The tradeoff doesn’t.

Prompting: Tell the model to “always do X” and hope it complies. You get fluency. You don’t get control.

Fine-tuning: Train the model on task-oriented behavior. It learns patterns. Patterns improve probability. They don’t create guarantees.

Orchestration: Wrap the model in workflow frameworks. Build state machines around it. Route between prompts. You get control over the scaffolding, but the LLM inside the wrapper is still probabilistic. The moment a user goes off-script, the system either breaks or falls back to uncontrolled generation.

Every approach hits the same wall: LLMs provide fluency, workflows symbolic systems provide control, and bolting them together in orchestration provides neither. You can have one or the other. You can’t have both.

03. Controllable Agents

Task-oriented agents need control. But “task-oriented” describes the use case, not the capability. Most task-oriented agents today are just LLMs pointed at tasks. They inherit all the limitations we just described.

We need a term for agents that actually deliver what task-oriented use cases require: Controllable Agents: agents whose behavior can be programmed in critical scenarios while preserving natural conversation everywhere else.

Controllable Agents are a new class. Not a product category or a marketing term but rather a computational category, defined by capabilities that LLMs structurally cannot provide.

What would a foundation model for Controllable Agents require?

Explicit state. Multi-turn interactions require tracking where you are in a process, what you know, what’s happened, what constraints apply. Language models have no native state, they rely on context windows and external memory.

Programmable behavior. Entities need to define how their agent behaves in high-stakes scenarios and know those definitions will hold. “If X, always do Y.” Not probably. Always. Language models are probabilistic by design.

Native tool use. Controllable agents invoke external systems: booking engines, payment processors, CRMs. These invocations need reliability, proper parameters, error handling, execution guarantees. Language models sample tool calls probabilistically.

White-box reasoning. Every decision must be traceable: which rules fired, how state evolved, why the agent acted. Language models are black boxes.

Fluent interaction. Despite all of the above, the agent still needs to converse naturally, handle unexpected inputs, respond like humans expect. Rigidity kills user experience.

This is the core tension: you need the fluency of neural language generation and the reliability of symbolic control. You need both, unified—not bolted together.

Controllable Agents need their own foundation.

04. Apollo-1

Apollo-1 is the first foundation model built for controllable agents.

Not a language model adapted for control. Not an orchestration layer around existing models. A new foundation, built from the ground up on neuro-symbolic architecture that unifies generation and control.

Apollo-1 combines neural modules that understand and generate natural language with symbolic modules that maintain state, enforce rules, and guarantee execution. The neural components interpret meaning, handle ambiguity, produce fluent responses. The symbolic components track state, apply logic, ensure defined behaviors execute exactly as specified.

The agent understands language like an LLM. It enforces behavior like a state machine. One model. Both capabilities. Native to the architecture.

You don’t program every interaction, only the ones that matter. For everything else, the agent defaults to intelligent, fluent, common-sense conversation. When you’ve defined specific behaviors, they execute with certainty. When you haven’t, the agent thinks for itself.

Because the reasoning is neuro-symbolic, it’s white-box. Every decision is traceable and auditable. And because each step is explicit, you can give feedback on specific parts of the reasoning, not just final outputs.

Apollo-1 is domain-agnostic and use-case-agnostic. The same foundation model powers auto repair scheduling, insurance claims, retail support, healthcare navigation, and financial services, without any domain-specific rebuilding. The symbolic structures that enable control are universal. Same model, different System Prompts.

This is what it means to be a foundation model for controllable agents: the core capabilities these agents require—state, control, native tool use—are built into the foundation, not wrapped around it.

05. Eight Years to Build the Solution

In 2017, we began solving and encoding millions of real-user task-oriented conversations into structured data, powered by a workforce of 60,000 human agents. The core insight wasn’t about data scale; it was about what must be represented.

We found out that task-oriented conversational AI requires two kinds of knowledge working together:

Procedural knowledge — roles, constraints, flows, policies
Descriptive knowledge — entities, attributes, domain content

Training a transformer on multi-turn transcripts can capture conversational style, but it won’t teach the model how to handle critical interactions correctly. Datasets are one-dimensional and stateless. Without explicit state, how is the model supposed to learn procedural knowledge?

To compute reliably over both kinds of knowledge, we needed a representation that separates structure from context while carrying each. We constructed a symbolic language that encodes procedural roles and descriptive facts, giving the model a typed symbolic state it can reason over.

In parallel, we observed that across use cases and domains—selling shoes, booking flights, processing loans—task-oriented dialogue follows universal procedural patterns. Food delivery, claims processing, and order management share similar procedural structures: parameter extraction, constraint validation, intent identification, policy enforcement, state-dependent branching, etc.

The key insight: if we could create a unified model where neural modules handle context and symbolic modules handle structure, we’d solve the problem on its head. Of course, it’d have to work agnostically across domains and use-cases, capable of symbolically representing any scenario requiring controllable behavior.

For the actual computation, we developed the Neuro-Symbolic Reasoner, a cognitive core that computes next actions from the current symbolic state, as opposed to predicting the next token. While neural modules assist in the translation to and from the symbolic language, symbolic modules maintain explicit state, enforce guarantees, and ensure that tool invocations are structured rather than probabilistically sampled.

Together, the symbolic language and the reasoner form Apollo-1: the foundation model for controllable agents.

06. How It Works (at a glance)

Apollo-1’s breakthrough is stateful neuro-symbolic reasoning: a computation built explicitly for task-oriented conversational AI.

Apollo-1 achieves generalization through a fundamental principle: structure-content separation.

The Neuro-Symbolic Reasoner operates on symbolic structures—intents, constraints, parameters, actions—that remain constant across domains, while neural modules continuously enrich those structures with semantic nuance.

Architecture: encoder–stateful reasoning loop–decoder

Domain-Agnostic Encoder: Translates natural language into symbolic state using both procedural and descriptive knowledge.
Stateful Reasoning Loop (iterates until turn completion):
- Neuro-Symbolic State Machine maintains symbolic state
- Symbolic Reasoning Engine computes next actions from state
- Neuro-Symbolic Planner creates executable plans
Domain-Agnostic Decoder: Generates natural language from final state

Apollo-1’s neuro-symbolic design unifies neural modules that understand context with symbolic modules that enforce structure.

The symbolic state represents both procedural progress (what state we’re in) and descriptive facts (what we know). Neural components interpret language and enrich understanding; symbolic components ensure reliable execution. Perception is probabilistic, but given the same state, the Reasoner always makes the same decision, delivering the behavioral guarantees that controllable agents require and making task execution reproducible, auditable, and steerable.

The Symbolic Reasoning Engine is a deterministic, rule-based engine, based on the procedural logic learned from years of solving and encoding millions of multi-turn task-oriented conversations with human agents, relying on a reputation system that ranks their turn outputs based on peer feedback.

The complete technical paper—including architectural specifications, formal proofs, procedural ontology samples, evaluation methodologies, and turn-closure semantics—will be released alongside general availability. [Request early access to the technical paper]

Augmented Intelligence (AUI) Inc. – Patents Pending

07. Programming Behavior in Critical Scenarios

Apollo-1 ships with a Playground where any use case runs from the System Prompt alone. The System Prompt isn’t configuration. It’s a behavioral contract.

Via the System Prompt, you define tools and policies in a specification that is immediately compiled into Apollo-1’s typed symbolic language, producing explicit, machine-checkable representations of intents, parameters, constraints, policies, tool schemas, and pre-/post-conditions.

You define how your agent must behave in scenarios that matter. Apollo-1 guarantees those behaviors execute. For everything else, the agent remains conversationally intelligent throughout. It handles unexpected inputs, maintains context, and responds naturally.

Via the System Prompt, you specify:

State-dependent rules: “If refund > $200, require ID verification”
Behavioral sequences: “Always offer insurance before processing payment”
Escalation logic: “Third failed payment attempt triggers human handoff”
Tool specifications: Required fields, pre- and post-conditions, failure states
Terminal states: How and when interactions conclude

When a food ordering app specifies “if allergy mentioned, always inform the restaurant,” that protocol executes. Always.

When a telecom provider specifies “third failed payment triggers escalation,” that policy enforces. Without exception.

When an insurance company specifies “claims over $10,000 require two approvals,” that workflow completes. Every time.

Control where you need it. Intelligence where you don’t. The agent is never stuck like a rigid workflow when users go off-script. But when your defined scenarios occur, your defined behaviors fire.

08. What Apollo-1 Isn’t For

Apollo-1’s architecture makes deliberate trade-offs. By optimizing for task-oriented agents, we’ve built a model that intentionally doesn’t compete in other domains, and that’s by design.

Open-Ended Creative Work
Apollo-1 isn’t designed for creative writing, brainstorming sessions, or exploratory dialogue where variation creates value. For drafting marketing copy, generating story ideas, or exploring hypothetical scenarios, transformers remain the superior architecture. Our symbolic structures enforce consistency; creativity often requires the opposite.

Code Generation & Software Development
While Apollo-1 can integrate with code execution tools in task-oriented workflows, it doesn’t offer state-of-the-art code generation. Transformers trained on massive code repositories excel at synthesizing programming patterns, autocompleting functions, and explaining algorithms. Apollo-1’s symbolic language is purpose-built for task execution, not software development.

Low-Stakes, High-Variation Scenarios
When conversational variety enhances user experience—customer engagement campaigns, educational tutoring with adaptive responses, entertainment chatbots—probabilistic variation is often preferable to deterministic certainty. Apollo-1’s guarantees become constraints when flexibility is the goal.

09. Early Deployments & Results

Apollo-1 is deployed in production at Fortune 500 organizations. Partnerships to power consumer-facing AI at some of the world’s largest companies in retail, automotive, and regulated industries will be announced alongside general availability.

Organizations testing Apollo-1 against their existing systems—some built over years with teams of thousands—are seeing the same pattern: order-of-magnitude improvements in task completion rates.

Benchmark Performance

Test / Benchmark	Apollo‑1	Best LLM Agent	Δ
τ‑Bench‑Airline	90.8–92.5 %	Claude‑4  60 %	+51%
Google Flights – 111 live booking chats	83 %	Gemini 2.5‑Flash 22 %	+277%
Amazon Retail – 120 live shopping chats	90.8 %	Rufus  16.7 %	+444%

Explore detailed evaluation scenarios, trajectories, and reward logs

10. Two Foundations for Two Futures

Two problems. Two architectures. Two foundations.

Open-ended agents—personal assistants, coding helpers, creative collaborators—work on behalf of users. Flexibility is the goal. LLMs excel here. They will keep getting better as LLMs improve: more capable, better tool use, longer context, stronger reasoning. This is the road toward general AI assistants that do whatever you need.

Controllable agents work on behalf of entities. They require programmable behavior with guarantees. When a defined situation occurs, a defined behavior must execute. Controllability isn’t a feature to add, it’s the core requirement that determines whether an entity can trust an agent with its customers.

LLMs weren’t built for this. Orchestration can’t solve it. The foundation has to be different.

Apollo-1 is that foundation. Being neuro-symbolic, it benefits from improvements to LLMs while providing what LLMs cannot: guaranteed control.

LLMs are foundation models for generation. Apollo-1 is the foundation model for control. Each serves its purpose. Each unlocks what the other can’t.

11. What this means

Every conversation that drives economic activity becomes reliably automatable.

Booking systems that complete reservations. Claims processing that adjudicates correctly. Customer service that resolves issues. Transaction systems that execute trades.

With guarantees of execution, enterprises can finally trust conversational agents with customer interactions because they have certainty that:

Their exact policies will be enforced as defined
Their specific business logic will execute as configured
Their unique brand experience will manifest as designed
Their interactions with customers will be fully documented and explainable

While open-ended agents enhance productivity, controllable agents are the productivity. Every transaction, every booking, every claim; these are the conversations that run the economy. Now they can run automatically.

12. General Availability

Apollo-1’s architecture integrates seamlessly with existing Generative AI workflows and adapts to any API or external system, with no need to change endpoints or preprocess data. It launches with native connectivity with all major platforms (Salesforce, HubSpot, Zendesk, etc.), and full MCP support. Strategic go-to-market partnership with Google.

General Availability December 2025, complete with:

Open APIs
Full documentation and toolkits
Rigorous evaluation methodologies
Voice and image modalities

13. Conclusion

Two kinds of agents need two kinds of foundations.

Open-ended agents—working for users, maximizing flexibility—run on language models. That path continues.

Controllable agents—working on behalf of entities, enforcing policies in critical scenarios—need a foundation built for control. LLMs can’t provide it. Orchestration can’t force it.

Apollo-1 is the first foundation model for controllable agents. Neuro-symbolic architecture that unifies generation and control. Agents you can actually program. Behaviors that actually execute.

Language models were never going to solve this. Controllable agents need their own foundation.

Now they have one.

View Appendix A: Evaluations

Back