Your prompts define behavior.
We make it measurable.
Wrap your LLM calls with Agent Plasticity. We analyze your prompts, extract behavioral expectations, and score every output — enabling you to align your agent to your exact intent.
const { text } = await generateText({
model: openai("gpt-5.2"),
system: "Be concise. Use markdown.
Cite sources."
})What is Agent Plasticity?
You have AI agents — chatbots, code assistants, whatever. They make LLM calls. You want to know: are they doing a good job?
Agent Plasticity answers that by:
Collecting every AI call your agent makes
One async API call per LLM interaction. Zero latency impact on your agent.
Figuring out what "good" means for that specific agent
We read your prompts and extract the behavioral expectations you already wrote — conciseness, tone, format, citations.
Scoring each call automatically using an LLM-as-judge
Every output gets evaluated against your specific behavioral metrics. Each score comes with reasoning tied to your prompt.
Showing you trends over time in a dashboard
See which behaviors hold up, which ones drift, and catch regressions before your users do.
How it works
Your prompts already say what "good" looks like. We extract those expectations and score every output.
Send your AI calls
Forward prompts and outputs as they happen. One API call, zero latency impact.
await fetch(url, { body: JSON.stringify({ prompt, output, model }) })
We read your prompts
AI analyzes the instructions in your prompt and extracts measurable behaviors.
Score and track
Every output is scored. See which behaviors hold up and which ones drift.
See why a behavior scored the way it did
Every metric comes with reasoning tied back to your prompt. You told the model to be concise — we tell you whether it was, and exactly where it fell short.
"Be extremely concise"
Second paragraph restates points from the introduction. Could be 40% shorter.
"Only answer in markdown"
Proper heading hierarchy, code blocks, and bullet lists throughout.
"Cite your data sources"
Two of four claims cite sources. BMI threshold and medication dosage are uncited.
One line to integrate
No SDK, no config files. Forward your calls and we handle the rest.
// Your existing agent code
const output = await llm.generate(prompt)
// Add Agent Plasticity
await fetch("https://app/api/ingest", {
method: "POST",
headers: { Authorization: "Bearer <token>" },
body: JSON.stringify({ prompt, output, model })
})Why we built this
When you build an AI agent, you write a prompt full of behavioral instructions: be concise, use markdown, cite your sources, keep a conversational tone. These instructions define the contract between you and the model.
But once that agent is in production, there's no way to know if those instructions are being followed. Did it stay concise? Did it cite sources? The only feedback loop is user complaints — and by then it's too late.
The fundamental issue: your prompts are full of behavioral expectations, but none of them are measured.
Most evaluation tools ask you to define generic metrics — accuracy, helpfulness, safety. We take a different approach. We read the prompt you already wrote, extract the specific behaviors you asked for, and measure whether the model actually follows them.
Your prompt is the spec. We just make it measurable.