analytical · standard
Which metric would you never optimize?
Which metric would you never optimize?
This question is not a trick and it is not asking for your vocabulary. It is asking whether you understand what happens when a measurement becomes a goal. Charles Goodhart named it in 1975: when a measure becomes a target, it ceases to be a good measure. The interviewer wants to know if you have spotted this trap in real product work and can reason about it under pressure.
Structure a strong answer
strong
"Time-on-site. And I'd extend that to session length for any product where the core value is completing a task or getting an answer. The structural problem: once you optimize for time-on-site, every decision that makes the product faster or more efficient looks like a regression. You end up rewarding the team for adding friction. The canonical example is YouTube watch time before 2019. Maximizing it worked in the short term, but the recommendation engine learned that extreme content retains longer, which created documented radicalization pathways and eventually drew congressional scrutiny. The metric was proxying for engagement, but the behavior it selected for was manipulation. For an AI assistant in 2026 the trap is sharper: if I optimize for session length on a model whose job is to help someone finish a task, I'm punishing the product for being good. The right pairing is task completion rate held alongside return rate. Neither one alone is trustworthy. Task completion rate without return rate tells you nothing about whether the task was actually done well. Return rate without completion rate just measures stickiness, which you can buy with dark patterns. I look for that Goodhart trap whenever I'm setting a north star or a guardrail metric."
weak
"I would never optimize for vanity metrics like page views or monthly active users because they don't reflect real value." This fails on three counts: MAU is a legitimate core metric at growth-stage companies, not a vanity metric. Using the phrase "vanity metric" without explaining the mechanism of harm signals pattern-matching, not reasoning. And it completely misses what the question is testing: not metric vocabulary, but awareness of second-order effects. Interviewers hear this answer constantly and score it below the bar.
What the interviewer is scoring
Three things, in order:
- Second-order reasoning. Can you trace what a team actually does when a metric becomes a target? Optimizing time-on-site produces notification spam, manufactured friction in exit flows, and infinite scroll. Optimizing NPS produces survey timing games and detractor recovery theater instead of product improvement. You need to name the mechanism, not just the metric.
- A genuine opinion. The question is a values probe. Candidates who hedge with “it depends on the context” are not being rigorous; they are being evasive. Pick a specific metric and defend it. The interviewer will push; hold the position if it is defensible.
- Current relevance. In 2026, agentic products make Goodhart’s Law faster and larger in scale. When a model or a fleet of agents optimizes for a proxy metric, the feedback loop is tighter and harder to audit than a human team gaming a dashboard. Task completion rate, for example, can be gamed by having the agent declare a task done before outcome verification. Reducing hallucination rate in isolation pushes models toward over-refusal, which trades one failure mode for another. A PM who cannot name which AI metrics become traps is a liability in an agentic product environment.
The 2026 reframe
Feasibility is largely free. The question of whether a product is viable long-term (whether users keep returning because the product genuinely served them) depends entirely on whether your metrics are measuring value or measuring a proxy that has drifted from value. A product that maximizes session length at the cost of user trust is not viable and not lovable. The “never optimize” question is really asking: do you understand what your metric is a proxy for, and where the proxy breaks down?
Pick a metric. Name the mechanism. Know the real-world case where it went wrong. That is what clears the bar.