---
title: Instrumentation strategy
description: The theory behind what to instrument in an AI agent, why span
  granularity matters, and how schema design shapes what governance can see.
editUrl: true
head: []
template: doc
sidebar:
  hidden: false
  attrs: {}
pagefind: true
draft: false
---

Prefactor records what the SDK sends. Which steps you choose to instrument, at what level of detail, and with what declared schema determines what the platform can surface, validate, and score.

## What makes a span worth instrumenting

A span is worth recording when it represents a discrete, meaningful unit of work with identifiable inputs and outputs. "The agent processed a request" is too coarse to be useful. "The agent called the web search tool with this query and received these results" is specific enough to inspect, validate, and score.

A useful test: would this span give a reviewer a clear picture of what happened at this step? Steps left uninstrumented are gaps in the record that cannot be filled in later.

Over-instrumentation is also a problem. Wrapping every internal function call produces noise that obscures the meaningful record. The right granularity is usually aligned with the conceptual steps in the agent's work — a conversation turn, a tool call, a sub-agent invocation — not streaming chunks or retries inside a library.

## Nesting spans for business context

Spans can be nested: a parent span can contain child spans, with the hierarchy preserved in the activity timeline.

Rather than recording only the raw API calls an agent makes, you can wrap a group of related calls in a parent span that names the business-level action they collectively represent. An agent researching a competitor might make several web searches and an LLM summarisation call — all of which can sit inside a `research_competitor` span that makes the intent explicit. A reviewer reading the timeline sees both the high-level action and the steps that composed it.

A parent span can carry its own type and payload, allowing the business action to be classified for risk independently of its constituent calls.

## Per-tool span types

A common shortcut is using a single generic span type for all tool calls — something like `tool_call` with a payload containing `tool_name` and arguments. The problem is that the [activity schema](/platform/concepts/activity-schema) can then only declare one shape to cover all tools, which means it cannot validate any of them precisely. A `web_search` call has a different shape from `read_file`, which has a different shape from `send_email`; a schema that tries to accommodate all three ends up either too permissive to validate anything, or too narrow to cover the others.

Giving each tool its own span type allows the schema to declare the exact expected shape for each one. Payload validation can flag deviations specific to that tool, and risk scoring can apply the appropriate action type for each — the consequence of `send_email` is different from `read_file`, and that distinction requires separate types.

## Permissive vs strict schema definitions

Every span type has a payload definition and a result definition. These can be strict — listing exactly which fields are expected — or permissive, allowing extra or unknown fields through without flagging them.

A strict schema catches drift early but requires your instrumentation to be stable first. A permissive schema lets you start recording without constraining yourself, but provides no conformance signal.

The practical approach is to start permissive and tighten over time. When you first instrument a span type, you may not know the full shape of its payloads. Recording with a permissive schema gives you real payload data to inspect. Once the shape stabilises, you update the schema to reflect what you actually expect — from that point, any deviation is flagged.

:::tip[What's a schema version?]
Prefactor creates a new [activity schema](/platform/concepts/activity-schema) version automatically when a run arrives with definitions that differ from the current version. You do not push schema versions explicitly — they appear as a side-effect of runs arriving with updated definitions.
:::

## Schema granularity and governance

The more precisely a schema declares field names and types, the more Prefactor can validate and the more precisely it can score for risk. [Risk profiles](/platform/concepts/risk-profile) classify spans by data category and action type, but those classifications depend on the span having enough declared structure for the rules to apply. A span type that declares which fields carry personal identifiers or financial data allows precise risk assessment. A permissive schema limits that assessment to what the actual payload happens to contain at runtime.

## Further reading

- [Activity schema](/platform/concepts/activity-schema) — how schema versions are created and what they validate.
- [Span](/platform/concepts/span) — the unit of instrumentation and what each span record contains.
- [Risk profile](/platform/concepts/risk-profile) — how span types and payloads feed into risk scoring.