Agent Script: When Your AI Agent Needs a Spine

There is a specific kind of demo that has been making the rounds since Agentforce launched. Someone opens Agent Builder, types a few sentences of plain English into a topic — “If the customer asks about their order, look up the order and tell them the status” — clicks Save, and the agent works. The room claps. A vice president nods. Everyone goes back to their desks convinced that building AI agents is now a no-code activity.

Then they ship it to production, and the agent — politely, confidently, and with the unshakeable self-assurance that only a large language model can muster — decides on Tuesday that it would rather skip the identity-verification step today because the customer seemed trustworthy.

That is the gap Agent Script exists to close.

Agent Script is Salesforce’s language for building Agentforce agents when “type some English and hope” is not good enough. It keeps the conversational reasoning that makes agents feel intelligent, but it lets you draw hard lines around the parts of the workflow that are not allowed to be improvised. Some steps reason. Some steps execute. You decide which is which, in code, in a file, in source control.

If you have read my Headless 360 implementation guide, you have already seen agents invoked from React and built from the CLI. This post goes one level deeper — into the language those agents are actually written in, and why it is a bigger deal than the announcement made it sound.

The TL;DR

If you want the shape of it before the deep dive:

Agent Script is a real language, not a config format. It is compiled, whitespace-sensitive (think Python or YAML), and it lives in a file you can diff, review, and version.
It blends two execution modes. The pipe operator (|) sends natural language to the LLM for reasoning. The arrow operator (->) runs deterministic logic that the LLM cannot reinterpret. You choose, line by line.
It fixes the three real problems with builder-only agents: non-determinism (the agent improvises critical steps), no state (it relies on the LLM remembering things), and no version control (you cannot diff a pile of clicks).
The killer use case is regulated, multi-step workflows — claims intake, KYC, prescription refills, loan pre-qualification — where some steps are legally mandatory and must happen in order, but you still want a human-feeling conversation.
It is not a replacement for Agent Builder. It is the layer underneath it. Simple FAQ agents stay in the UI. Workflows with real stakes graduate to script.

The rest of this post explains the language, builds a complete agent, and is honest about when you should not reach for it.

What Was Actually Wrong With Builder-Only Agents

To understand why Agent Script matters, you have to be honest about what came before it — and most of the marketing is not.

When Agentforce first shipped, you built agents by assembling topics (now called subagents) in a point-and-click builder. Each topic had a description, a set of instructions written in natural language, and a list of actions it was allowed to call. The reasoning engine — the LLM — read your instructions, looked at the user’s message, and decided what to do.

This works beautifully for a certain class of problem. “Answer questions about our return policy” is a topic that genuinely does not need determinism. Let the model reason. If it phrases the answer three different ways to three different customers, nobody is harmed.

But the moment your workflow has steps that must happen, the model is a liability:

Problem	What it looks like in production
Non-determinism	The agent skips identity verification because the conversation “felt” verified. It collects fields out of order. It calls the refund action before checking eligibility because the customer was insistent.
No reliable state	You told the agent to “remember the policy number.” Forty turns later, in a long conversation, the policy number has quietly fallen out of the context window and the agent is now confidently working with the wrong account.
No version control	Your “agent” is a collection of clicks in a UI. You cannot diff it. You cannot code-review it. You cannot see what changed between the version that worked and the version that started hallucinating. Two admins edit it and nobody knows who broke what.
No real testing	How do you write a regression test for a pile of natural-language instructions? You mostly cannot. You poke it manually and pray.

These are not edge cases. These are the normal failure modes of putting an LLM in charge of a business process. The industry spent 2024 and 2025 discovering them the hard way.

The builder is great at the conversation. It is terrible at the contract. Agent Script exists because real workflows have both.

Agent Script’s answer is not “remove the LLM.” It is “put the LLM on a leash where it counts, and let it run free where it doesn’t.”

The Core Idea: Two Operators, One Agent

Almost everything interesting about Agent Script comes down to two symbols.

The pipe (|) begins a prompt instruction. Everything after it is natural language that accumulates into the prompt sent to the LLM. This is where reasoning happens.

| Greet the customer warmly and ask how you can help with their claim today.

The arrow (->) begins a logic instruction. This is deterministic. It runs in code, top to bottom, and the LLM does not get to reinterpret it.

-> run @actions.verify_policy with policy_number = @variables.policy_number
   set @variables.policy_status = @outputs.status

That distinction is the whole philosophy. A single subagent interleaves both:

subagent: collect_loss_details
  reasoning:
    instructions:
      | Ask the customer for the date the incident occurred.
      | Then ask where it happened and for a short description.

      -> if @variables.injuries_reported == True
           run @actions.flag_for_adjuster with claim_id = @variables.claim_id

      | Reassure them that an adjuster will follow up within 24 hours.

The prompt instructions read like a script for a helpful human. The logic instruction in the middle is a hard guarantee: if injuries were reported, an adjuster is flagged — every single time, no exceptions, no “the model felt it wasn’t necessary.”

This is the thing that does not come across in a keynote slide. Agent Script is not a fancier way to write instructions. It is a way to mix guaranteed execution with probabilistic reasoning in the same breath.

The Language, Concretely

Before we build something real, here is the working vocabulary. I am going to keep this tight — the official reference is exhaustive; this is the 20% you use constantly.

Syntax primitives

Token	Meaning
`\|`	Begins a prompt instruction (natural language to the LLM)
`->`	Begins a logic instruction (deterministic execution)
`@`	References a resource: `@actions.x`, `@subagent.x`, `@variables.x`, `@outputs.x`, `@utils.x`
`{!@variables.x}`	Resolves a variable’s value inside prompt text
`#`	Comment
`...`	Slot-fill token — tells the LLM to populate a value

It is whitespace-sensitive. Indentation defines structure (2 spaces or a tab), and mixing tabs and spaces will get you parsing errors, exactly like YAML or Python. If you have ever fought a Flow’s clunky canvas, the idea that your agent’s logic is just text with indentation is going to feel like a gift.

The blocks

An agent is a set of top-level blocks:

config — identity and settings: developer_name, agent_label, description, agent_type (ServiceAgent or EmployeeAgent), logging flags.
system — global instructions plus the required welcome and error messages.
variables — the global state the agent and script can read and write.
language — supported languages.
start_agent — the router. The conversation starts here, and it decides which subagent to hand off to.
subagent — a specialized unit of behavior, with its own reasoning instructions and actions.

Variables that behave

Two variable flavors matter early:

variables:
  policy_number: mutable string      # the script can change this
  customer_name: linked string       # value comes from an external source

A mutable variable is your working memory — and critically, it lives in script state, not in the LLM’s context window. It does not fall out of memory forty turns into a conversation. A linked variable is bound to an external source (a Salesforce field, session context) and populated for you.

You assign with set:

-> set @variables.claim_id = @outputs.new_claim_id

And you interpolate into prompts with {!...}:

| Thanks {!@variables.customer_name}. Your claim number is {!@variables.claim_id}.

Actions

Actions are how the agent does things — call a Flow, run Apex, hit an API. The shape is run / with / set:

-> run @actions.create_claim
     with policy_id = @variables.policy_id
     with loss_date = @variables.loss_date
     set @variables.claim_id = @outputs.claim_number

Actions can target a Flow directly:

-> run @actions.lookup_policy
     target: "flow://Policy_Lookup_By_Number"

Built-in utilities

Three you will reach for constantly:

@utils.transition to @subagent.x — hand control to another subagent. One-way. Control does not come back.
@utils.escalate — escalate to a human rep.
@utils.setVariables — instruct the LLM to extract and set variable values from the conversation.

And the available when modifier — conditionally hide a subagent or action from the reasoning engine so the model cannot even see a tool it should not use yet:

subagent: process_total_loss
  available when @variables.is_total_loss == True

That available when line is quietly one of the most important features in the language. It is the difference between telling the model not to do something and making it impossible for the model to do it. The second one actually works.

The Business Case: Meridian Insurance, First Notice of Loss

Let me build something real, because the language only makes sense when the stakes are concrete.

Meridian Insurance is a mid-market auto insurer. When a policyholder has an accident, the first thing that happens is First Notice of Loss (FNOL) — the initial report that opens a claim. Today, FNOL happens over the phone with a call-center agent reading from a script, and it is expensive, slow, and inconsistent at 2 AM when accidents actually happen.

Meridian wants an Agentforce agent to handle FNOL intake. Here is what makes this a perfect Agent Script case and a terrible builder-only case:

Identity verification is mandatory and non-negotiable. Regulatory and fraud requirements mean the agent must verify the policyholder before discussing any claim details. The LLM does not get a vote on this.
Coverage must be confirmed before a claim is created. Creating a claim against a lapsed policy is a compliance problem. This check must run, and it must run before intake proceeds.
Data must be collected in a specific order — date of loss, location, description, injuries, other parties — because downstream systems and adjusters depend on it.
The path forks on severity. Minor fender-bender with no injuries? The agent can complete self-service intake. Injuries or a likely total loss? It must escalate to a human adjuster immediately — by law in several states.
Throughout all of this, it must feel like talking to a calm, competent human, not a form with a pulse.

Requirements 1 through 4 are contracts. Requirement 5 is conversation. That is the exact split Agent Script was built for. Let us build it.

Building the FNOL Agent

Step 1: Config and system

config:
  developer_name: Meridian_FNOL_Agent
  agent_label: "Meridian Claims Intake"
  description: "Handles First Notice of Loss intake for auto policyholders."
  agent_type: ServiceAgent
  enable_enhanced_event_logs: true

system:
  instructions:
    | You are Meridian Insurance's claims intake assistant.
    | You are calm, empathetic, and efficient. The person you are
    | talking to may have just been in an accident. Lead with care.
    | Never speculate about fault, coverage amounts, or payouts.
  welcome:
    | Hi, I'm here to help you report a claim. I'm sorry you're dealing
    | with this — let's get it started. First, I'll need to verify a
    | few details.
  error:
    | I'm having trouble on my end. Let me connect you with a
    | claims specialist who can help right away.

Notice the welcome message already sets up the verification step. We are not asking the model to decide to verify — we are architecting the conversation so verification is the first thing that happens.

Step 2: Variables — the agent’s spine

variables:
  policy_number: mutable string
  policy_id: mutable string
  policy_status: mutable string
  caller_verified: mutable boolean
  loss_date: mutable string
  loss_location: mutable string
  loss_description: mutable string
  injuries_reported: mutable boolean
  is_total_loss: mutable boolean
  claim_id: mutable string

This block is the whole reason the agent is reliable. Every one of these values lives in script state. When the agent is forty messages deep into a stressful conversation, caller_verified is still exactly what we set it to. It cannot drift out of a context window, because it was never in one.

Step 3: The router

start_agent:
  reasoning:
    instructions:
      | Determine what the customer needs.
      | If they want to report a new accident or claim, hand off to
      | identity verification first.

      -> if @variables.caller_verified is None
           @utils.transition to @subagent.verify_identity
    actions:
      - @subagent.verify_identity
      - @subagent.collect_loss_details

The router does one important thing: if the caller is not yet verified, it transitions to verification deterministically. The LLM is not asked “should we verify?” The script decides.

Step 4: Mandatory identity verification

subagent: verify_identity
  description: "Verifies the policyholder before any claim details are discussed."
  reasoning:
    instructions:
      | Ask the customer for their policy number and the ZIP code on file.

      -> run @actions.verify_policyholder
           with policy_number = @variables.policy_number
           with zip = ...
           set @variables.policy_id = @outputs.policy_id
           set @variables.policy_status = @outputs.status
           set @variables.caller_verified = @outputs.verified

      -> if @variables.caller_verified == False
           | I wasn't able to verify those details. For your security,
           | I'll connect you with a specialist.
           @utils.escalate

      -> if @variables.policy_status != "Active"
           | It looks like there may be an issue with the policy status.
           | Let me connect you with someone who can review it.
           @utils.escalate

      | Thanks, you're verified. Now let's get the details of what happened.
      -> @utils.transition to @subagent.collect_loss_details
    actions:
      - @actions.verify_policyholder

Read what this guarantees. The verification action always runs. If verification fails, the agent always escalates — there is no path where an unverified caller proceeds. If the policy is not active, it always escalates. Only after both deterministic gates pass does the conversation move forward. The LLM phrases the messages warmly; the script enforces the law.

This is the Fetch Data pattern and the Required Subagent Workflow pattern working together — fetch the verification result before the model reasons about anything, and make the workflow mandatory.

Step 5: Ordered data collection with a severity fork

subagent: collect_loss_details
  available when @variables.caller_verified == True
  description: "Collects the structured details of the loss, in order."
  reasoning:
    instructions:
      | Ask, one at a time and in this order:
      | 1. The date the incident occurred.
      | 2. The location (city and state).
      | 3. A brief description of what happened.
      -> @utils.setVariables

      | Now ask directly: was anyone injured?
      -> @utils.setVariables

      # Deterministic severity routing — the model does not decide this
      -> if @variables.injuries_reported == True
           set @variables.is_total_loss = True

      -> if @variables.injuries_reported == True
           | Because there were injuries, I'm going to connect you with
           | a claims adjuster right now who specializes in this.
           @utils.transition to @subagent.escalate_to_adjuster

      | Thanks. Based on what you've described, I can file this for you now.
      -> @utils.transition to @subagent.file_claim
    actions: []

The available when @variables.caller_verified == True line means this subagent does not exist as far as the reasoning engine is concerned until verification is done. The model cannot accidentally route here early. It is not told not to — it simply cannot see the door.

The injury check is the severity fork, and it is a hard branch. The LLM extracts whether injuries occurred (@utils.setVariables), but the consequence of that fact is deterministic code.

Step 6: The two terminal paths

subagent: escalate_to_adjuster
  available when @variables.injuries_reported == True
  description: "Hands off injury or total-loss claims to a human adjuster."
  reasoning:
    instructions:
      -> run @actions.create_priority_claim
           with policy_id = @variables.policy_id
           with loss_date = @variables.loss_date
           with description = @variables.loss_description
           set @variables.claim_id = @outputs.claim_number

      | I've opened priority claim {!@variables.claim_id} and flagged it
      | for an adjuster. Someone will call you within the hour. Please
      | hold while I connect you now.
      -> @utils.escalate
    actions:
      - @actions.create_priority_claim

subagent: file_claim
  description: "Files a standard self-service claim."
  reasoning:
    instructions:
      -> run @actions.create_claim
           with policy_id = @variables.policy_id
           with loss_date = @variables.loss_date
           with location = @variables.loss_location
           with description = @variables.loss_description
           set @variables.claim_id = @outputs.claim_number

      | All set. Your claim number is {!@variables.claim_id}. You'll get
      | a confirmation text shortly, and you can upload photos through
      | the link we send. Is there anything else I can help with?
    actions:
      - @actions.create_claim

Two clean terminal paths. Both create a claim through a real action (a Flow or Apex method). One escalates to a human, one completes self-service. The claim number comes back from the action and gets interpolated into a warm, human message.

That is a complete, regulated, production-shaped FNOL agent. Read it top to bottom and notice: you can see the guarantees. The compliance steps are not buried in a paragraph of English instructions hoping the model complies. They are arrows. They run.

Why Agent Script Over the Alternatives

Now the question the title promised. You have several ways to build an Agentforce agent. When do you reach for Agent Script specifically?

Approach	What it is	Best when	Falls apart when
Agent Builder (topics + NL instructions)	Point-and-click, natural-language instructions	FAQ bots, simple lookups, low-stakes Q&A	Any workflow with mandatory ordered steps or compliance gates
Flows as actions	Deterministic Flows the agent can call	You need a reliable action, embedded in an otherwise simple agent	The orchestration between actions needs to be deterministic, not just the actions themselves
Apex invocable actions	Custom code as agent tools	Complex computation inside a single step	The problem is the sequencing of steps, not the steps themselves
Agent Script	A real language for the orchestration layer	Multi-step, stateful, regulated, or high-stakes workflows that must be versioned and tested	Genuinely simple agents — it is overkill for an FAQ bot

The pattern: Flows and Apex make individual actions reliable. Agent Script makes the conversation between them reliable. Before Agent Script, you could have a perfectly deterministic Apex action sitting inside a completely non-deterministic agent that called it whenever the mood struck. The weak link was the orchestration, and the orchestration was exactly the part you could not control.

There are three reasons to choose Agent Script, and they map directly to the three problems from earlier:

1. You need determinism in specific places. Not everywhere — that would defeat the point of using an LLM at all. But the verification step, the eligibility gate, the ordered data collection — those need to be guarantees, and -> gives you guarantees.

2. You need real state. mutable variables in script state do not evaporate from a context window. For any conversation long enough to matter, this is the difference between an agent that remembers the policy number and one that quietly starts working with the wrong account.

3. You need engineering discipline. Agent Script is a file. It goes in Git. You diff it, you review it in a pull request, you see exactly what changed between the version that worked and the version that broke. You can build it locally in VS Code with Agentforce DX. For anything a business actually depends on, this is not a nice-to-have — it is the entire reason software engineering has version control.

If none of those three apply, do not use Agent Script. Use the builder. I mean that. Reaching for a compiled, whitespace-sensitive language to build a return-policy FAQ bot is the AI-agent equivalent of writing a Kubernetes operator to run a cron job.

The Three Authoring Modes (Pick Your Altitude)

One genuinely thoughtful thing Salesforce did: Agent Script is not all-or-nothing. The same agent can be edited at three altitudes, and different people on your team can work at different ones.

Conversational — you literally chat with Agentforce to build the agent. “If the order is over $100, offer free shipping.” It writes the script. Good for first drafts and for admins.
Canvas View — visual blocks with shortcuts (/ for expressions, @ for resources). The middle ground. You see structure without staring at raw syntax.
Script View — you write the code directly, with syntax highlighting and autocomplete. This is where engineers live.

And for the people who want the full engineering workflow — local files, Git, CI — there is Agentforce DX in VS Code, where the script is just a file in your project that deploys through the CLI, exactly like the rest of your Headless 360 toolchain.

The important part is that these are views of the same artifact. An admin can rough out a flow conversationally; an engineer can open the same agent in Script View and add the deterministic guardrails. Nobody is locked out, and nobody is locked in.

Managing, Testing, and Shipping

Building the agent is the fun 80%. Shipping it responsibly is the 20% that matters.

Source control. Because the agent is a script, it lives in your repo alongside your Apex, LWC, and metadata. Pull requests. Code review. Blame history. The thing you could never do with a builder-only agent.

Testing. Pair Agent Script with the agent evaluation suites I covered in the Headless 360 guide — sf agent test run against scripted scenarios. The determinism is what makes testing meaningful: when the verification step is an arrow, not a suggestion, you can actually assert that an unverified caller always escalates. You cannot write that assertion against a natural-language instruction, because the natural-language instruction does not promise anything.

Deployment. Two paths: configure and deploy from the org, or use Agentforce DX from the terminal for a CI/CD pipeline. Same sf project deploy muscle you already have. Re-run the eval suite on every deploy; agents drift, and the suite is your only signal that today’s agent behaves like yesterday’s.

Logging. That enable_enhanced_event_logs: true in the config block is not decoration. Turn it on. When an agent does something surprising in production, the conversation logs plus the Session Tracing pipeline are how you find out why.

Where Agent Script Will Bite You

I would not be doing my job if I sold this as frictionless. Real caveats, learned by people who shipped:

Whitespace sensitivity is exactly as annoying as it is in YAML. One stray tab in a file of spaces and the whole thing fails to parse, sometimes with an error message that points at the wrong line. Configure your editor to show whitespace and standardize on spaces. This will still bite you at least once.

Determinism is a tool, not a default — and over-using it defeats the point. I have already seen people try to script every branch of a conversation with if/else, recreating a 1990s phone tree with extra steps. If you find yourself enumerating every possible user utterance in logic instructions, you have stopped building an AI agent and started building a state machine badly. Let the model reason where reasoning is cheap. Reserve arrows for where mistakes are expensive.

Transitions are one-way, and people forget this. @utils.transition to does not return. If you architect a flow expecting control to come back to the previous subagent, it will not, and you will spend an afternoon confused. Design your subagent graph as a directed flow, not a call stack.

It is new, and the edges show. This is 2026-current tooling. Error messages are improving but not always precise. The local-dev story through Agentforce DX is solid but evolving. Treat it like the early-but-real technology it is, not a decade-matured platform.

It does not absolve you of good action design. Agent Script orchestrates actions; it does not make a badly written Apex action safe. WITH SECURITY_ENFORCED, proper error handling, and idempotency in the underlying actions still matter. The script is the conductor, not the orchestra.

Open Questions

Things I am still working through, and would genuinely like other architects’ takes on:

Where is the line between Agent Script and Flow? Both are now “deterministic orchestration on the platform.” For a given workflow, when do you script the agent versus push the logic into a Flow the agent calls? I have a working heuristic — conversation-shaped logic goes in the script, transaction-shaped logic goes in a Flow — but it is fuzzy at the edges.
How do you version an agent’s behavior, not just its source? The script is in Git, which versions the text. But the agent’s behavior also depends on the model version and the grounding data. A clean git diff does not capture “the model got updated and now phrases refusals differently.” We do not have great tooling for behavioral versioning yet.
Who owns the script? It sits exactly on the admin/developer boundary. Admins can build conversationally; engineers add the guardrails in Script View. In practice, who is responsible for the file? That is an org-design question more than a technical one, and most teams have not answered it.
Does the determinism actually reduce hallucination risk enough for the hardest-regulated industries? Healthcare and financial services have a higher bar than “the important steps are arrows.” I think Agent Script gets you genuinely closer than builder-only agents. I am not yet sure it gets all the way there for, say, a clinical context.

The Verdict

Agent Script is the moment Agentforce stopped being a demo and started being a development platform.

The builder was always going to be enough for the easy 30% of agents — the FAQ bots, the simple lookups, the low-stakes Q&A. It is the other 70% that mattered, the agents that touch money and compliance and people having a bad day, and those agents were quietly impossible to build responsibly with natural-language instructions alone. You could build them. You just could not trust them, version them, or test them.

Agent Script fixes the trust problem the only way it can actually be fixed: by letting you draw a hard line between the parts of the conversation that are allowed to think and the parts that are required to obey. Pipes for the thinking. Arrows for the obeying. A file you can put in Git for everything.

If you are building anything with real stakes on Agentforce, this is no longer optional. The agents that survive contact with production in 2026 are the ones with a spine — and the spine is written in Agent Script.