🇫🇷🇬🇧 Score your procurement vs. a sector benchmark — free, no emailTake the assessment
Anthropic's Project Deal: why the best AI model is a measurable economic asset for indirect procurement
agentic-procurementllmrfpsourcinganthropicai-strategy

Anthropic's Project Deal: why the best AI model is a measurable economic asset for indirect procurement

Anthropic had Claude agents negotiate 186 deals. The frontier model captures 10–25% more value than Haiku 4.5 — and users do not notice. What it means for your agentic RFP.

Alexandre Lio · 7 May 2026 · 7 min read

Lire en français →

On this page
Share this article

By Alex Lio. The Procurementor. Special edition, "What's Changing".

Anthropic just published a study showing their frontier model (Opus 4.5) outnegotiates their lighter model (Haiku 4.5). Healthy first reflex, especially for any buyer trained to look for bias: "Of course they are saying that, it serves their commercial interest." Second reflex, after reading the protocol: they are right, and the result matters. 186 deals, ~$4,000 in value, +$2.68 as a seller, –$2.45 as a buyer, on items priced at a $12 median. The frontier model captures 10 to 25% more value than the lighter model, and users do not detect the gap. Here is what it changes for your agentic contracts, your RFP and your monitoring.


The healthy reflex first

For 18 months, the public debate on AI in procurement has been spinning on two badly framed questions: "will AI replace buyers?" and "is it cheaper than a human?". Both miss the actual issue.

The actual issue: when AI agents negotiate for you, does the underlying model quality change the economic outcome, and by how much?

Project Deal brings an empirical answer. It is not neutral (Anthropic publishing on Anthropic), but the protocol is solid.


1. What Anthropic actually did

In December 2025, Anthropic recruited 69 of its employees, gave them $100 in budget, and asked them what they wanted to sell or buy among their personal belongings. Each participant was interviewed by Claude for under 10 minutes to set: items to sell, reserve price, items wanted, max budget, preferred negotiation style.

Each participant was then assigned a personalised Claude agent, tasked with representing them on a Slack-marketplace. No human intervention during the experiment: the agents posted listings, made offers, counter-offered, sealed deals, with no human-in-the-loop.

Four marketplaces ran in parallel:

  • Run A and D: all agents on Claude Opus 4.5 (frontier).
  • Run B and C: 50/50 between Opus 4.5 and Haiku 4.5 (lighter model).

Only one was "real" (Run A) where the goods were actually exchanged. The others ran for research purposes.

Headline result: 186 deals closed, ~$4,000 in transaction value, broadly satisfied participants (4/7 mean fairness rating, meaning "fair").


2. The result every procurement leader should pay attention to

On Runs B and C (mixed model), Opus agents objectively outperformed Haiku agents on several metrics:

Metric

Opus advantage vs Haiku

Significance

Number of deals closed

+2.07 deals per user

p = 0.001

Sale price for the same item (Opus seller vs Haiku)

+$3.64 on average

p = 0.011

Seller effect (across all deals)

+$2.68

p = 0.030

Buyer effect (across all deals)

–$2.45

p = 0.015

On items priced at a $12 median and a $20 mean, a $2 to $3 gap per transaction represents 10 to 25% of the value. Statistically significant and material.

[image: The same broken folding bike negotiated by two different models: $38 with Haiku, $65 with Opus, a +70% capture spread on the same transaction] (/wp-content/uploads/2026/05/project-deal-frontier-vs-light-model.png)

Source: Anthropic, Project Deal, April 2026.

The most striking example: the same broken folding bike, sold by the same seller to the same buyer, went for $38 with a Haiku agent and $65 with an Opus agent. +70% sale price on the same object, in the same transaction. The only variable is the model that negotiates. An identical lab-grown ruby: $35 with Haiku, $65 with Opus.

The uncomfortable backwash

Here is where it gets interesting: participants did not notice the difference. When asked to rate the perceived quality and fairness of their deals, Haiku users rated their results roughly as satisfying as Opus users. 11 out of 28 participants even rated their Haiku run better than their Opus run, while the numbers say the opposite.

Direct implication. If a quality gap between models opens up in a real agentic economy, the losers will not realise it. Anthropic phrases it cautiously, I will phrase it less so: this is a systemic governance hole.

The other result that matters: prompting does not compensate

Some participants instructed their agent to be aggressive ("lowball at first", "negotiate hard"). Others asked for a friendly style. Both groups got statistically the same outcomes. Aggressive buyers did not pay less, aggressive sellers did not sell higher (once you neutralise the fact they posted higher starting prices).

In other words: model quality matters more than the instructions you give the model. Prompt-engineering on the negotiator does not catch up a capability deficit. A central point for framing model selection in any agentic deployment.


3. The take-away that should structure your procurement strategy

This is where many current AI-in-procurement discussions go off the rails.

The standard economic reflex ("we will pick the cheapest model, it is good enough for our use case") is the typical framing error. IT buyers sourcing an agentic solution today look first at cost per token or cost per call. That is the equivalent of choosing a lawyer on hourly rate without looking at their courtroom track record.

Project Deal brings a quantified proof: on a simple negotiation task (which is not even the most demanding LLM use case), moving from frontier to lighter model costs you 10 to 25% of transaction value. And that cost stays invisible to users who do not run active benchmarks.

Apply that to a 50 M€ indirect-procurement budget. Even at 5% lost value through default model selection, you leave 2.5 M€/year of margin on the table. To save what? A few thousand euros of LLM licence per year. The cost/benefit ratio is unambiguous.


4. Concrete implications for indirect procurement

4.1. LLM licence cost is the wrong KPI

In any agentic RFP, the current evaluation grid heavily weights inference cost (token in / token out, cost per request). It is intuitive, measurable, visible on the cloud bill. But it is secondary to the value-capture gap between two models on the same use case.

The right approach: require a comparative benchmark on the actual use case, with two models from different generations, and measure the economic-outcome gap. If the vendor cannot provide that benchmark, red flag.

4.2. Routing strategy must be negotiable in the contract

Many agentic platforms (Arkestro, Pactum AI, Globality, Keelvar) route internally between several models to optimise their own inference margin. Without a contractual clause, you pay for Opus but you get served Haiku on the transactions where it shows the least.

To embed in any agentic contract:

  • transparency on which model is actually used per transaction (auditable logs),
  • right to spec the minimum acceptable model based on transaction-value thresholds,
  • semestrial audit clause on model mix vs economic outcomes.

4.3. Buyer-side change management changes nature

The unexpected Project Deal finding is that users do not detect their agent's underperformance. Operational buyers' self-evaluation ("I'm happy with my tool") stops being a reliable performance signal. You need permanent objective metrics: price vs benchmark, completion rate vs target, captured value vs reserve price.

The procurement function therefore needs a continuous agent-monitoring capability, independent of the solution vendor. Probably the most under-invested item in current procurement transformations.

4.4. Supplier-side asymmetry: a new front opens

Project Deal had Claude agents negotiate with each other. Tomorrow, your Claude / GPT / Gemini agents will negotiate with your suppliers' AI agents. If your suppliers are better equipped than you, the asymmetry plays against you on every transaction, invisibly.

It revives a question we had abandoned for 10 years: what negotiation model does our supplier run, and with what tool? It used to be an HR question (training, experience). It will be a tech question tomorrow (model, routing, instructions, monitoring).


5. The limit to flag

Project Deal remains an internal experiment of 69 people on Slack, on second-hand items at $12 medians, with a biased sample (Anthropic employees, already pro-AI). The numbers do not generalise as-is.

And crucially, the experiment proves that between AI agents, the better model wins. It does not prove that an AI negotiates better than a trained human. That is a different question, even more important for procurement in 2026, and precisely the question agentic vendors are carefully avoiding. Watch closely.


6. What to take away

The directional signal is clear, and probably the most important of the year for any agentic procurement function:

  1. Model quality is a measurable economic asset, not a secondary technical criterion.
  2. The performance gap between models is invisible to users, hence non self-correcting.
  3. Prompting does not compensate for capability deficit. You will not save a bad model with a good prompt.
  4. The gaps will widen as AI agents take more transactions, and as your suppliers gear up.

For procurement leadership structuring a 2026-2027 agentic roadmap, the framing rule is simple: pay for model quality, negotiate routing transparency, instrument performance monitoring independently of the vendor. Everything else is secondary.

The way the LinkedIn community currently debates AI in procurement ("compute too expensive", "AI not profitable", "humans still cheaper") misses the real point. The real point: the model that negotiates for you carries material economic value, and you are under-investing in it.


Sources


You're structuring an agentic roadmap on indirect procurement and want to stress-test it before signing the RFP? Let's book 30 minutes.

I'm Alex Lio. 10+ years in indirect procurement, digital transformation and now AI, in service of my clients.

Share this article

Indirect, Augmented · monthly

Liked this? The monthly newsletter goes deeper — one map, every first Tuesday.

First Tuesday of every month, 07:30 CET. One-click unsubscribe.

AL

Alexandre Lio

15 yrs Amazon & Cellnex · €50M+ negotiated · 5,000+ trained

Independent procurement consultant. I help CPOs, CFOs and operations leaders fix category management, deploy AI-ready sourcing stacks and build teams that actually deliver savings.

Let's work together

Your challenge deserves a proper look.

Book a free diagnostic

Subscribe via RSS or email.