ai pm · craft
How to pass the vibe-coding round when you can't code
Over 30% of top-50 tech company PM loops now include a live prototyping round as of 2026, and demand for “AI fluency” in PM job postings grew seven-fold from 2024 to 2026. Andrej Karpathy coined “vibe coding” in 2025; by 2026 it became a formal interview format at Google, Stripe, Netflix, Adobe, and Meta. Candidates panic because they can’t code. That misreads what is scored.
In 2026, feasibility is free. Any PM can ship a working prototype in 45 minutes. The round exists precisely because of that. What it scores is not “can you build” but “what you choose to build, what you choose to cut, and whether the thing you shipped is viable and lovable rather than technically impressive and useless.” The prototype is evidence. The judgment is the product.
What it actually tests (the five-dimension rubric)
Interviewers score across five dimensions, not one:
- Problem framing and scope control: did you define the user, the context, and the explicit cut list before touching the tool?
- Prompting and tool fluency: do you direct the AI effectively, or do you wait passively and accept whatever it generates?
- Tradeoff reasoning: can you name why you made each decision (mock data vs. real, this flow vs. that one)?
- User-centered decisions: are your choices tied to a real user need, or to what was easiest to generate?
- Communication under pressure: are you narrating something substantive every 60 seconds, even while the tool is running?
Silence is a scoring event, not a neutral pause.
Pick the right tool for the problem type
“Get fluent in one tool” is correct but incomplete. Which tool depends on what you’re building:
- Lovable: consumer-facing, UI-heavy flows where visual quality matters. Generates polished UI from a prompt with almost no setup.
- Cursor: multi-file or API-heavy work, developer tools, anything where you need to reason about code structure. Best if you can read code.
- Bolt: fastest path to a full-stack working prototype, especially when you need a database-backed interaction.
- Replit: zero local setup, shareable URL immediately. Strong for remote sessions where the interviewer will click through your prototype.
Know your chosen tool cold before the session. The tool should not be the bottleneck.
The 45-minute framework
Minutes 0-10: scope ruthlessly. Name the user, the one job they’re doing, and exactly what you will not build. A concrete scope statement sounds like: “I’ll focus on first-time setup for solo owners with fewer than 100 SKUs. Entry, stock view, and low-stock alerts only. No multi-location, no supplier integrations, no reporting.” Write the cut list before you open the tool.
Minutes 10-35: build and narrate. The talking-to-building ratio should be roughly 50/50. Every time you write a prompt, say what you’re asking for and why. Every time output arrives, evaluate it aloud. “This modal is off, but the data structure is right, I’ll fix the UI and move on” is a scoring moment.
The pivot rule: if a prompt fails three minutes in a row, stop. Verbally acknowledge the constraint (“the AI keeps generating a date-picker here and I don’t need one”), cut the feature, and move on. Candidates who fight a broken prompt for five or more minutes burn the build window and signal poor scope judgment.
Minutes 35-45: debrief. This is its own scored section. You must cover: a specific success metric (not “I’d measure engagement”), the launch bar (what has to be true before this ships), a prioritized list of what you’d build next, and the shipping risks you’d mitigate. Skipping any of these leaves a scoring dimension empty.
Company-specific scoring differences
These are materially different loops, not the same round with different logos.
Google weights scope control heavily, plus accessibility and international considerations. Calling out a keyboard-navigation gap or a right-to-left layout consideration during the build signals the instinct they’re looking for.
Stripe interviewers are technical and will interrogate your backend data model. They’ll ask about your schema decisions, what’s indexed, and what breaks at scale. Narrate your data modeling choices even when you’re mocking the backend.
Netflix is design-focused. Visual quality and interaction pattern reasoning are emphasized. Be ready to defend why you chose a specific navigation pattern and what interaction research supports it.
Meta uses an internal Llama-based tool that is less polished than v0 or Lovable. Interviewers probe token usage, latency, and retrieval strategy during the prototype debrief. You should acknowledge the cost and latency tradeoffs of any AI-powered feature you build, not just the UX.
Startups run 30-minute sessions and jump straight to GTM in the debrief: “Would you ship this today? What’s the first customer segment? What’s the launch bar?” Have a go-to-market answer ready before the build starts.
What a passing narration sounds like
The difference between passing and failing narration is specificity, not volume.
Passing: “I’m mocking the inventory sync here with static JSON rather than building a real API call. In production this would be a webhook from the supplier feed, but for this session I need to prove the alert logic works, not the data pipe.”
Failing: silence while the tool runs, then “OK here’s what I built.”
The first tells the interviewer you understand tradeoffs and are making deliberate calls. The second tells them nothing except that you can wait for output.
The trap
Building too many features and delivering nothing that works end-to-end. A senior candidate delivers one lovable core flow, narrates every cut in real time, hits the pivot signal the moment a prompt stalls, and closes the debrief with a specific success metric, a clear launch bar, a ranked next-build list, and the shipping risks they’d address first. The code is almost irrelevant. The judgment is the entire round.
strong
"The winner delivers one lovable core flow that works, narrates every tradeoff in real time (something substantive every 60 seconds), hits the pivot signal the moment a prompt fails for three minutes straight, and closes the debrief with a defined success metric, a clear launch bar, and a ranked list of what to build next. They scope ruthlessly in the first 10 minutes, naming exactly what they will not build and why, and tie every UI decision to a user need, not technical convenience. At Google they call out accessibility. At Stripe they speak to the data model. At Netflix they defend the interaction pattern. At Meta they acknowledge token cost and latency tradeoffs in the Llama-based tool. The code is almost irrelevant. The judgment is the entire round."
weak
"The common failure: candidate goes quiet while waiting for AI output (silence kills scores regardless of prototype quality), builds too many features and delivers nothing that works end-to-end, treats the prototype as the deliverable rather than evidence of judgment, and skips the debrief or gives a vague 'I'd measure engagement' answer. Some candidates fight a broken prompt for five or more minutes instead of pivoting, burning half the build window. Others ship a technically impressive prototype that solves the wrong problem for the wrong user. The prototype passes; the PM judgment does not."
Minimum viable prep
Ten hours hands-on, including timed sessions where you record yourself narrating. You cannot skip the narration practice. A prototype you built quietly tells you nothing about whether you can narrate under pressure. Run at least three full 45-minute timed sessions before your actual interview, treat the eval harness as your scoring rubric, and debrief each attempt against all five dimensions above.