Acceptance criteria in product management

Acceptance criteria are the testable conditions that define when a user story is done from the product owner’s perspective. They are not a test plan, not a requirements list, and not the same as Definition of Done. Their job is to make “done” unambiguous before a sprint starts, so engineers estimate against understood scope and QA knows exactly what to verify. Late or vague AC is the most reliable cause of sprint scope creep, and one of the clearest signals interviewers use to sort PMs who have shipped from those who have only read about it.

Three formats, used by context

Scenario-based (Given/When/Then, also called Gherkin or BDD). Use for user-facing flows where sequence matters. Each scenario covers one path: happy, error, and edge case.

Given a logged-in user with at least one saved address, When they tap “Reorder,” Then the cart populates with original items, the default address is pre-selected, and the total appears within 1.5 seconds.

This maps directly to automated test cases, which is why engineering and QA favor it.

Rule-oriented. Use for UI states, content specs, or business logic without a natural user journey. Write as a bullet list of system behavior rules.

The badge disappears on tap, not after the modal closes.

If zero unread items exist, the badge is hidden, not shown as “0.”

Checklist. Use for compliance or non-functional requirements where the answer is binary.

Meets WCAG 2.1 AA color contrast ratios

All strings localization-ready, no hardcoded English in markup

AC vs Definition of Done: same feature, side-by-side

A story can pass all its AC and still fail the DoD. Most candidates conflate the two.

Acceptance criteria are story-specific: what this feature must do to satisfy its user story. They vary per story. Definition of Done is team-wide: code reviewed, tests written and passing, deployed to staging, docs updated. The DoD does not care what the feature does; it cares that the team’s quality bar was met.

On a “bulk-export CSV” story: the AC is “Given 50+ rows selected, when Export is clicked, a CSV downloads within 3 seconds with correct column headers.” The DoD is “unit tests cover the export function, PR reviewed by at least one engineer, feature verified in staging.” A story that exports correctly but has no tests passes AC and fails DoD. Conflating the two signals you have not worked in a functioning agile team.

Common mistakes

Writing implementation steps instead of user outcomes: “The button calls /api/submit” is not AC. “The user sees a confirmation within 2 seconds” is. Writing untestable criteria: “The UI should feel intuitive” cannot pass or fail. Every criterion needs a binary test. Writing AC after sprint planning means estimates were made against undefined scope.

AC for AI features

Binary pass/fail breaks down when output is probabilistic. AC for an AI summarization feature must cover four things the standard format misses: (1) an output quality range (“captures the main topic and two supporting details in 80%+ of test cases,” not “the summary is good”); (2) fallback when model confidence is below threshold (“Given confidence below 0.7, show a disclaimer and log the event”); (3) a latency SLA (“first token within 800ms at p95”); (4) graceful degradation when the model is unavailable (“display a static error state with a retry button, not a blank pane”).

For agentic features, AC also covers guardrails: “Given the agent is asked to delete files, When no explicit confirmation has been provided, Then the agent pauses and requests confirmation before acting.” This maps directly to agent safety evaluation criteria.

In 2026, the AC document for an AI feature is functionally the same artifact as an eval spec. Both define what “good” looks like for model output. PMs who understand this bridge can translate behavioral intent into something the model team can measure in CI. Those who write vague AC for AI features and wonder why model behavior drifts are not operating at the right level.

What interviewers are testing

strong

"AC are story-specific testable conditions, distinct from the team-wide Definition of Done. I default to Given/When/Then for user flows because it forces three paths: happy, error, edge case. For non-sequential specs I use rule-oriented bullets. Every criterion must be testable. I write them before sprint planning. For AI features I add fallback criteria, latency SLAs, and degradation behavior, because binary pass/fail does not work for probabilistic output. I treat them as a draft eval spec, since they define what the model needs to produce to be acceptable."

weak

"Acceptance criteria are basically the requirements for a feature." This conflates AC with user stories, offers no format reasoning, ignores the DoD distinction, and has nothing to say about AI features. Interviewers flag this because the failure modes are specific and experiential. If you have not seen scope creep from late or vague AC, you do not understand why the practice exists.

For the relationship between stories and criteria, see user story and agile. For AI-specific guardrail AC patterns, see agent guardrails cheat sheet.

Three formats, used by context

AC vs Definition of Done: same feature, side-by-side

Common mistakes

AC for AI features

What interviewers are testing

Related