Cognition (Devin) PM interview: what product ownership means when your engineer is an AI

The conventional PM job description barely maps onto this role. Cognition has 305 employees, a CPO (Walden Yan, former Cursor, IOI gold medalist) who likely owns PM hiring decisions, and a founding team that collectively holds 10 IOI gold medals. Devin writes 89% of code commits at Cognition itself. If your mental model of PM work is quarterly roadmaps and capacity allocation, you are preparing for the wrong job.

What a PM at Cognition actually owns

When autonomous execution handles the majority of coding throughput, PM scope shifts to: identifying which engineering workflows have a high enough toil-to-value ratio that customers will pay ACU costs without budget scrutiny; defining what “working correctly” means for a non-deterministic system so that trust can be operationalized; and managing the positioning boundary between two products with opposite philosophies.

Devin is fully autonomous: it receives a task and executes without human steering. Windsurf is human-in-the-loop: an IDE with Codemaps for developers who want assistance without handing off control. A PM must reason across both. Recommending a Devin feature that belongs in Windsurf, or vice versa, signals you haven’t thought through the portfolio’s internal logic.

The interview loop

Cognition does not publish a structured PM process. Expect four conversations.

Hiring manager screen. Walden Yan or a senior PM probes whether you understand what autonomous agents actually fail at. Name a specific failure mode and explain what product surface addresses it without removing autonomy. Vague answers about “AI-powered workflows” end here.

Product case. Expect an “improve Devin” or “design a feature for enterprise engineering teams” prompt. IOI-caliber interviewers immediately recognize an answer that could have been given about any SaaS dev tool.

Technical depth. Understand the autonomous execution loop and its failure surface, how ACU-based pricing differs from seat-based pricing in consumption incentives, and why Cognition routes across multiple model providers rather than committing to one.

Culture. No career ladders. You advance by shipping things that matter. Behavioral answers heavy on process management or committee-driven prioritization will not resonate at a 305-person company with direct founder exposure.

The question that surfaces everything

“How would you improve Devin for enterprise engineering customers?”

strong

"The highest-friction moment is when Devin reaches an ambiguous decision point and either halts or proceeds on a wrong assumption that burns an hour of compute. I'd surface the three assumptions Devin is least certain about before execution begins: not as a confidence percentage, but as a list the engineer can validate or correct. This collapses a multi-hour recovery into a 30-second pre-flight check, reducing ACU waste and increasing merge rate, which is what enterprise buyers use to justify renewal." This names a real failure mode, proposes a concrete interaction model (not "more transparency"), and connects directly to ACU cost-awareness.

weak

"I'd add integrations with Asana and Monday.com, and improve onboarding." This treats Devin as a SaaS workflow tool. Interviewers from a competitive programming background recognize generic PM pattern-matching immediately. The other common failure is a roadmap with no viability reasoning: "I'd prioritize code review automation because developers care about quality." No ACU economics, no enterprise buyer framing. Every feature conversation at a company with consumption-based billing must connect to compute cost per successful task.

What clears the bar

In 2026, feasibility at Cognition is effectively solved. The PM job is viability and lovability. Viable: which engineering workflows have a toil-to-value ratio that clears ACU pricing before reaching a budget committee? Enterprise customers include Nubank, Ramp, OpenSea, and Eight Sleep (Devin used as a data analyst, not just a coder). Lovable: the agent must feel like a capable collaborator, not a non-deterministic black box. Confidence surfaces, observable planning, override affordances, not UI polish. The bar is human trust in autonomous action.

Know the numbers: $492M ARR (up from $73M a year prior), 89% internal code share, SWE-Bench PR merge rate around 67%. Any metrics or strategy question tests whether you build from these or from the abstract. For the 2026 frame on why feasibility is no longer the hard part, see feasibility is free and lovable, not just usable. For building the commercial case that an autonomous agent is worth its pricing, see proving viability.

What a PM at Cognition actually owns

The interview loop

The question that surfaces everything

What clears the bar

Programs