🇫🇷🇬🇧 Where does AI pay off in your spend? 5-min check, free, no emailTake the assessment
AI & Digital · AI & S2PAI is a tool for execution, not decision: How to build your own performing agents.
AI & Digital · AI & S2P

AI is a tool for execution, not decision: How to build your own performing agents.

Building an AI agent isn't a six-month project: the four layers of an AI architecture, how to build an agent, a procurement example, and the ROI, costed.

On this page

By Alex Lio · The Procurementor.

You've sat through three webinars on AI agents this quarter. You've read fifteen LinkedIn posts promising full autonomy. And in your procurement department, the count of agents actually in production is still at zero.

The blocker has never been technological. It's semantic. It all comes down to one misunderstood word.

"Agent" sounds like a six-month project, a team of data scientists, a six-figure budget. In 70% of the organizations I work with, it's exactly that image that blocks the first deployment. The reality is more mundane: a useful agent often gets built in a few days, with the tools you already pay for (assuming you have an AI subscription).

And let's be clear: you already use agents. When you ask the AI to reason step by step, to generate a PDF, to produce a Word document, or to go look something up on the internet, that's exactly it. An agent.

Building an agent is, in the end, nothing more than defining a set of procedures and steps you ask an AI to follow. The goal: reduce the black-box effect (if I change one word in the prompt, my deliverable can change completely) and therefore guarantee a repeatable result.

Let's look at the topic in detail to understand the different layers of an AI architecture. (You can skip straight to the "Building an agent" section if the theory interests you less.)

01 · The four layers, without the jargon

Before building, you need to know where you stand. Four words circulate, and people confuse them all the time. They don't denote four rival technologies. They denote four stacked layers, each placed on the previous one.

LLM. The raw language model. Its only skill: generate plausible text, reason, draw on what it saw during training. When you ask base models a question and the answer comes instantly, it's just a question sent to the LLM whose answer is predicted immediately. Its limit: it only knows its training data. It knows nothing about your supplier contracts. But it will know how to predict (and invent) an answer to any question (more or less well).

RAG (Retrieval-Augmented Generation). You add your external knowledge. Before answering, the system goes and fetches the right documents (your contracts, your catalogs, your price history) and injects them into the prompt. That's what reduces hallucinations and grounds the answer in your data. But it remains a layer that answers. It doesn't act. The more precise and clean the information provided, the less the LLM has to guess elements, and so the answer becomes more and more on-target. Don't forget that LLMs remain language models and are therefore less performant when it comes to maths (not to say completely idiotic sometimes).

AI Agent. You add action. The system calls tools, runs code, queries an API, breaks a task into steps and loops until the result. Instead of answering, it pursues a goal across multiple steps. It's the layer that turns "answer me" into "do it." The agent will first turn your request into a plan it understands and will execute those actions. From the moment the model produces a "reasoning," an agent steps in: it's simply pre-programmed into the AI platform. Example: creating a PPT, a task performed by an agent that has been given all the best practices and tools to create a PPT. The PPT will be created according to the context of the prompt and the elements (RAG) added.

Agentic AI. Several agents working together. Orchestration, assigned roles, agent-to-agent exchange protocols, shared memory. One agent researches, another writes, a third validates. We'll talk about it more later; these approaches remain rare in procurement but unlock even more interesting results. (See loop engineering for those who want it.)

Each layer adds capability. Each layer also adds complexity and failure modes, and costs! - using LLMs without reasoning is fast, very cheap, and can be enough in many cases: summarizing meeting notes, drafting an email, etc... no need to go invoking adversarial agent loops for that.

Four layers, not four technologies: LLM, RAG, Agent, Agentic.

Remember this: the right layer is the smallest one that solves your problem.

02 · When an agent is the right answer (and when it isn't)

An agent isn't the solution to everything. It's the solution to a specific kind of situation. Three signals indicate that a single agent is the right tool:

  • The task is repetitive and based on clear rules. You do it every week, and you could explain it to an intern on one page. (a fairly good intern who can code just about anything)
  • It requires fetching information from several sources and cross-referencing it: an ERP, an inbox, an Excel file, a supplier website.
  • The bottleneck is human time, not the final judgment. The human decides; the agent prepares the decision file.

Conversely, don't automate with an agent to make decisions: a decision with high legal stakes, a one-off strategic trade-off, or a task you only do twice a year. The cost of building it rarely pays off - and I even find that AI can bring more confusion than clarity when used on this kind of subject: AI is a tool for execution, not decision.

A simple test, before any project. Ask yourself: "Could I write the complete procedure on one page?" If yes, it's an agent candidate. If the answer comes in three vague words, it's not mature yet.

03 · Building an agent, concretely

A. The architecture of an agent

Four bricks. Not a team of fifteen people.

  1. An objective written clearly. A system instruction that describes the mission, the scope, and what the agent does NOT have the right to do. It's the spec sheet. It's also the guardrail.
  2. Access to your data (the RAG layer). Your documents indexed so the agent retrieves the right contract or the right price line instead of guessing. The more precise the context, the less the agent guesses (and so the less it gets wrong).
  3. Tools the agent can call. A calculator, access to your ERP, a web search, an email send. The agent doesn't compute a should-cost in its head. It calls the tool and reads the exact result.
  4. A human validation loop. A point (or several) where a human signs before the irreversible action. No validation, no production.
An agent = four bricks: objective, data, tools, validation.

The key point is creating the brief: the prompt and the context that allow the request to be understood. If the agent misunderstands the need, it calls the right tool with the wrong data. Garbage in, garbage out. Half the work of building it is framing the instruction and testing on real cases.

B. How you actually build it

When I first started looking into agents and how they're built, I didn't quite get it: could I simply tell Claude to create an agent that would do, say, supplier evaluation? Yes, and it would do it - but it would do it based on what it predicts to be best practice - from: internal research it would do with an agent and the training data it received at its creation. So that would change essentially nothing compared to a classic prompt request: we're not adding any particular information to the model.

To make building this kind of agent worthwhile: you have to give it good tools (Pappers in France, for example, to access company databases, EcoVadis for CSR, etc...) and give it in detail the protocols, the company's rules, and how they're applied. Once these elements are orchestrated into the agent - you can use it to produce results consistent with your needs - and that will be repeatable, standardized.

Many tools already exist. Vertex on the Google side, OpenAI and Claude each have their own agent builder. The result: the process becomes relatively simple, and more of a business conversation than an IT project. Other independent and multi-AI solutions exist: Dust, Gumloop, and others let you create agents that run via any AI (LLM) on the market (an important point when it comes to optimizing the cost of the agent in production).

A simple voice interview with one of these tools is often enough to get a prototype you then refine. The level is, by the way, starting to mature, with subjects developing on top: agent orchestration, Hermès agents, and the rest. We'll talk about it another time.

04 · Concrete example: the purchase-requisition pre-analysis agent

Let's take a case I see in almost every mid-cap: the purchase requisition (PR) that arrives incomplete. The procurement lead spends 30 to 40 minutes per PR completing the need, finding the historical supplier, checking the reference price, and preparing the file before arbitration.

The agent does the preparation work:

  • It reads the PR and identifies what's missing (quantity, spec, cost center).
  • It queries your history (RAG layer) to retrieve the last price paid and the suppliers already referenced in this category.
  • It computes a first-pass should-cost by calling a calculation tool, not in its head.
  • It drafts a one-page summary with three options and a sourced reference price.

The procurement lead receives a file ready to arbitrate instead of a line to decipher. They decide. The agent prepares. The final validation stays human.

Realistic build: 3 to 5 days of work for a first pilot on a single category, with off-the-shelf tools (a commercial LLM, a connector to your data, a well-framed instruction).

05 · The ROI, costed and cautious

Here's how I model the return, without inflating the figures. The amounts below are indicative orders of magnitude, to calibrate on your own organization: they don't rest on a sourced study.

Item

Before

With the agent

Prep time per purchase requisition

30 to 40 min

5 to 10 min (review)

PRs handled / week (1 buyer)

20 to 30

20 to 30

Time recovered / week

baseline

8 to 12 hours

Usage cost (tokens)

0

€50 to €150 / month by volume

ROI: 0.2 to 0.3 FTE redeployed, break-even under two months.

On a fully-loaded buyer cost of €55,000 to €75,000/year (indicative order of magnitude, to calibrate on your own structure), 8 to 12 hours recovered per week is the equivalent of 0.2 to 0.3 FTE redeployed toward negotiation and value sourcing. The usage cost stays under €150/month at this scope. Break-even generally falls under two months.

Two caveats, because nobody writes them down.

One: this ROI assumes a narrow scope at the start, a single category, not a big-bang rollout across the whole function.

Two: set a token spend cap per agent and a named budget owner, exactly as you have a cap per corporate card. An agent in a loop can generate a four-figure cloud bill overnight. Spend governance doesn't disappear. It shifts.

A useful AI agent comes down to three things: a one-page procedure, four bricks, and a human who signs at the end. No transformation project in there.


20-minute diagnostic. Want to know which procurement task on your team is the best first candidate for an agent? Let's frame it together over a call. Book a diagnostic

Alexandre Lio

12 yrs · ex-Amazon EMEA · Cellnex · €50M+ negotiated · 5,000+ trained

Independent procurement consultant. I help CPOs, CFOs and operations leaders fix category management, deploy AI-ready sourcing stacks and build teams that actually deliver savings.

Book a diagnostic

AI applied to your procurement, by an operator: the job, public pricing, and the right entry step.

AI procurement consultant →
Indirect, Augmented · monthly

Liked this? The monthly newsletter goes deeper: one map, every first Tuesday.

The monthly map, every first Tuesday at 07:30. One-click unsubscribe.

First Tuesday of every month, 07:30 CET. One-click unsubscribe.

In this series

AI & Digital · 14 articles