Feature flag (definition)

Feature flags decouple deployment from release. Code ships to production with the flag off: no user sees the feature, no traffic hits the new path. When the team is ready, the flag opens, either to everyone at once or to a controlled slice of users. That gap between “code is live” and “users see it” is the mechanism behind canary releases, dark launches, kill switches, and AI parameter experiments. A PM who describes feature flags only as a “rollout tool” is behind; the complete answer includes four distinct flag types, the flag debt problem the PM is accountable for, and the 2026 extension of flagging AI model configuration rather than just code paths.

The four flag types and their lifespans

Not all flags are the same. Treating them identically is how flag debt accumulates.

Release toggles ship unfinished work safely. They live for days to weeks. The expectation is that the flag gets removed after the feature reaches 100% rollout, typically within 90 to 120 days. A release toggle that outlives its rollout without being archived is debt.

Experiment flags run A/B tests. They live for hours to weeks, just long enough to reach statistical significance. Once the winner is determined, the flag collapses: the losing path is deleted, the winning path becomes the default. Letting experiment flags persist after the experiment ends is the most common source of stale toggles.

Ops flags (kill switches) are circuit breakers. They exist to disable a feature instantly without a redeploy. For streaming clients using SSE or WebSocket, a kill switch propagates in milliseconds. A traditional redeploy and restart takes minutes. That difference is the mechanical reason kill switches are standard practice at high-traffic systems. Some ops flags are legitimately permanent: a global kill switch for a billing integration, for example, may never be removed.

Permission flags control feature access by user tier, geography, or account status. They live for years and function more like configuration than release tooling. Think of a paid-tier feature gate: it is not going anywhere.

Deploy does not equal release

This distinction matters in interviews and in practice. When a team deploys without flags, every deployment is also a release: users immediately see whatever shipped. Feature flags break that coupling. Engineering can merge and deploy throughout a sprint; the feature becomes visible to users only when the PM decides the flag opens.

The operational benefit is rollback speed. If a deployment is broken and the flag is the on/off switch, reverting is a flag toggle, not a redeploy. The canary pattern builds on this: open the flag to 5-10% of traffic, watch error rate and business metrics, advance or revert at the routing layer. See canary release for what the PM owns at each traffic gate.

Dark launch is a specific variant: deploy with the flag permanently off and exercise the new code path in the background to verify behavior under production load, without any user-visible output. It is not the same as percentage rollout; it is the phase before rollout begins.

Ring deployments formalize the sequence: internal employees, then opt-in beta, then broad rollout. Microsoft named this model explicitly and uses it across Office and Azure products. The rings are not about geography; they are about risk tolerance and signal quality at each layer.

The 2026 extension: flagging AI configuration

In 2026, the most important use of feature flags at AI-native companies is not gating code paths. It is gating model configuration: prompt templates, model versions, inference settings, system instructions. LaunchDarkly’s AgentControl platform formalizes this; Statsig’s 2026 guidance goes further, recommending that each AI model parameter get its own individual flag rather than one combined flag.

The reason is cascade risk. Changing multiple AI parameters simultaneously makes it impossible to isolate which change caused a quality regression. An individual flag per parameter lets the team identify the culprit, not just roll back everything. A PM flagging an AI system should set rollback criteria against AI-specific signals: hallucination rate from an automated eval, user thumbs-down rate, token cost per query, and output latency. Infrastructure error rate catches almost nothing in a generative AI system because the model returned output successfully; the output was just wrong.

This is where viable and lovable intersect operationally. Flags let you run a viability test (“do users actually use this feature?”) before full commitment. They also let you tune lovability iteratively (“which prompt version produces outputs users prefer?”) without a code deploy cycle. The tooling that enables both is the same; the PM intent is different.

Flag debt: a PM accountability problem

Flag debt is not an engineering cleanliness issue the PM can ignore. Roughly 80% of flag removals touch multiple files. Stale toggles introduce silent risk: code that runs only when an old flag is in a certain state becomes untested by the time someone finally reads it. Teams that let active flags grow beyond around 50 per team lose the deployment velocity the flags were supposed to create.

DORA 2024 data shows that only around 19% of teams reached elite delivery performance, and the elite tier shrank from 31% to 22% between 2023 and 2024. Flag hygiene is one of the practices that separates high-frequency shippers from teams that accumulate risk. As a PM, you should know which flags you own, when each is expected to be retired, and whether a flag that has been sitting untouched for six months is still serving a purpose.

Tooling landscape

Common implementations: LaunchDarkly (most feature-complete, enterprise standard), Statsig (experiment-first, strong for AI product teams), Unleash (open-source, self-hostable), PostHog (product analytics plus flags in one tool). OpenFeature is the CNCF-incubating vendor-agnostic evaluation standard (incubated November 2023, spec v0.8.0 March 2024). Interviewers at platform or infrastructure companies may reference it to see whether you know the standard exists independent of any vendor.

Strong vs. weak answers

strong

"Feature flags decouple deployment from release: code ships to production but users see nothing until the flag opens. There are four types with different lifespans: release toggles (days to weeks, remove after full rollout), experiment flags (collapse to the winner after statistical significance), ops/kill switches (milliseconds to propagate, some permanently live), and permission flags (effectively config, live for years). A kill switch propagating in milliseconds vs. a redeploy taking minutes is the mechanical reason kill switches are standard at scale. In 2026 I also use flags to gate AI model configuration: individual flags per prompt template or model version, so I can isolate which parameter caused a quality regression rather than rolling back everything. Flag debt is mine to manage, not just engineering's: stale toggles compound risk, and I keep active flag count under 50 per team."

weak

"A feature flag lets you turn a feature on or off for certain users. It's useful for gradual rollouts." This describes the mechanic without demonstrating PM judgment. It skips flag types, flag debt, the deploy-vs-release distinction, and any 2026 angle. The interviewer follows immediately with: what flag types do you use, how do you decide when to flag vs. when not to, who is responsible for removing old flags. A candidate who stops at the definition has nothing left to say.

For how flags compose with traffic routing in a staged release, see canary release. For the randomized experiment that runs after a successful canary, see A/B test. For using internal employees as the first cohort before any flag opens externally, see dogfooding.