ai pm · thesis

Proving viability when anyone can build it

Updated Jun 2026 Calibrated to the strong-hire bar

“Should we build this?” is now almost entirely a viability question. Feasibility stopped being the gate: Cursor, Bolt, or a frontier API can validate any build hypothesis in an afternoon, at near-zero marginal cost. That shift makes viability harder to fake. Interviewers at frontier labs have already watched dozens of “our AI is better” answers collapse on follow-up, and they probe viability more aggressively than any other dimension. If you walk into a go/no-go question and lead with TAM, you are already behind.

Why the standard answer fails

The junior answer has three moves: market size (“it’s a $50B market and we only need 1%”), a few user interviews as willingness-to-pay evidence, and a claim that the AI quality is the moat. Each step fails independently.

TAM alone doesn’t show that the paying segment is accessible. Users overstating willingness to pay in interviews is a well-documented bias; interviewers know this and will press for revealed preference. And “our model is better” is a 12-to-18-month argument at most. Token costs fell from roughly $30 per million tokens in 2022 to around $0.40 per million tokens in 2026, a 75x collapse. Model quality converges to near-parity quickly. Approximately 79% of OpenAI enterprise customers also pay Anthropic, which makes “model quality is the moat” a provably weak answer in any frontier-lab loop.

The junior answer also skips cost-to-serve entirely. That omission is the clearest junior signal: inference costs still account for 20 to 23% of total AI product costs, and a team that achieves product-market fit before unit economics work has locked itself into a cost structure it cannot exit without churning its highest-usage customers.

Two distinct failure modes

Interviewers distinguish between two types of viability failure, and each gets a different probe.

Wrong market. The pain exists but it isn’t paid pain. Users suffer the problem but resolve it with muscle memory, workarounds, or tolerance rather than spending. The probe here is: “What do people currently spend to solve this?” If the answer is nothing, the path to revenue is an uphill behavior-change argument, not a product argument.

Right market, wrong moat. The pain is real and paid, but nothing prevents a competitor with the same frontier API access from shipping the same product in six weeks. The probe here is: “What specifically do you have that a well-funded team starting today doesn’t?” Generic answers (better UX, better prompting, faster iteration) don’t survive this question. Distribution, proprietary data, or workflow lock-in created by the product’s output are the only durable answers in 2026.

Knowing which failure mode your product risks shapes how you structure the answer.

The three tests, with evidential standards

Run these in sequence. Each requires specific evidence, not assertion.

1. Acute, paid pain. Name a proxy that reveals willingness to pay without asking directly: existing spend on workarounds, a budgeted line item in adjacent tools, or contract renewal rates on the closest substitutes. “Users told me they’d pay” is stated preference. An existing expense category or a waiting list with a deposit is revealed preference. Interviewers know the difference.

2. A paying segment, sized honestly. Not TAM. The segment that already spends money on this problem category, with a realistic penetration path. A $50B total addressable market is irrelevant if the paying segment is 2% of it and already committed to a competitor. Size what you can capture, not what exists.

3. Right-to-win that survives the commodity model question. Durable moats in 2026 come from proprietary data flywheels, embedded workflows, or switching costs created by the product’s output, not the underlying model. Harvey has a right-to-win in legal AI not because their model is better but because they trained on curated case law and deal documents that took years to license and ingest. Glean’s right-to-win in enterprise search is distribution: they are already connected to a company’s Slack, Drive, Notion, and GitHub, so the switching cost is organizational, not technical. A generic LegalAI or enterprise search tool building on the same frontier model has neither of those. That pattern: specific structural advantage vs. “we move fast and the model is good,” is what separates a strong answer from a weak one.

Cost-to-serve: the section most candidates skip

At what per-query cost does the unit economics work? Senior candidates name a number. A worked version: at $0.02 per query and a $30 per seat per month plan, the average user needs to run fewer than 1,500 queries per month for the business to stay at 60% gross margin. If your power users run 5,000 queries, they are unprofitable and they are also the customers driving your growth metrics. That is the inference cost trap: you cannot reprice them out without churning the people your retention numbers depend on.

AI product gross margins are projected at 52% in 2026, up from 41% in 2024, but only for teams that modeled the cost-to-serve accurately from the start. OpenAI has been cited as losing roughly $1.35 per dollar earned on inference at current pricing. Citing that example shows the interviewer you understand why viability isn’t just a demand question.

One more dimension worth naming: viability and lovability now have to work together in a way they didn’t before. The floor on usability is much higher in 2026 because every competitor can also ship something functional quickly. A product that is viable but not genuinely lovable (meeting people where they work, anticipating needs without being obtrusive) will see usage that doesn’t convert to renewal. Willingness to pay a second time is a tighter bar than willingness to pay the first time.

strong

"I'd run three tests before recommending we build. First: is the pain acute enough that people already pay for something worse? For this product, the proxy I'd look for is existing spend on manual workarounds or adjacent SaaS tools, not user interview quotes. Second: is the paying segment large enough, not the total market? I want to know how many companies or individuals already budget for this problem category and whether we have a credible path to 10% of them. Third: do we have a right-to-win that survives 18 months of model commoditization? I'd argue our right-to-win is [specific: proprietary data / embedded workflow / distribution already in the customer's environment]. On cost-to-serve: at our expected query volume and a $X seat price, we need average usage below Y queries per month to hold 60% gross margin. Here's how the pricing tier or throttle handles heavy users without churning them."

weak

"This is a $50B market and we only need 1% of it. I talked to 10 users and they all said they'd pay. Our AI is better than what's out there right now." No revealed preference, no cost-to-serve, and a moat that erodes within a year. An interviewer at a frontier lab will surface the customer overlap data and the answer collapses.

In the loop

This framing applies directly to strategy questions and go/no-go scenarios at OpenAI and Anthropic, where “what’s the moat?” is a standard probe. The right-to-win answer has to name something specific: data, workflow lock-in, or distribution already inside the customer’s working environment. Generic claims about model quality, speed, or UI will not clear the bar.

For the underlying economics behind the cost-to-serve argument, see LLM unit economics. For the feasibility-is-free context that explains why viability is now the primary gate, see feasibility is free. For how lovability intersects with viability at the renewal layer, see lovable, not just usable.