rca · hard
Notifications up, time on site flat: how to answer this PM interview question
Notification engagement is up six weeks running, but time on site is flat. What do you do?
This question is not a pure root-cause exercise. It is a test of metric design literacy: can you recognize that two metrics diverging is not automatically a problem, and that the shape of the answer depends entirely on whether time-on-site is the right thing to be tracking at all? Most candidates treat it as a debugging drill. Strong candidates name the conceptual tension first, then diagnose.
What the interviewer is testing
Notification CTR is a leading indicator. Time-on-site is a lagging measure of session depth. These can diverge for legitimate reasons, and jumping to “we are spamming users” before checking which reason applies is the clearest junior signal in an analytical round. Exponent identifies naming this leading/lagging tradeoff before being prompted as one of the most reliable senior signals in an execution interview.
Three things are being evaluated: metric design literacy (does time-on-site even belong here?), hypothesis generation discipline (can you build a structured tree, not a list?), and business judgment (can you name when flat is fine?).
Clarify before diagnosing
Before forming any hypothesis, ask: “When you say notifications are up, do you mean send volume, open rate, or click-through? And is time-on-site measured per session or as a daily total?” The mechanism changes completely by definition. A rising send volume with flat CTR is a reach problem. Rising CTR with flat session time is the interesting case this question is actually about.
The hypothesis tree
Work from most likely and cheapest to verify toward least likely and most expensive.
1. Quick-task completion (behavioral, most common). Notifications are driving high-intent actions: comment, react, accept a request, confirm a payment. Users complete the job in under 30 seconds and leave. The product is working. Time-on-site is simply not the right signal for this use case. Check task completion rate and D7 retention. If both are healthy, this is a north star problem, not a product problem.
2. Deep-link substitution. Notifications resolve intent on a single targeted screen, bypassing the feed. Users do not start a browsing session because they do not need one. Look for flat session-start events alongside rising pageviews on notification-destination screens.
3. AI-agent and digest effects (2026-specific). AI notification digests (Meta AI, Google AI summaries) can satisfy user intent without a full app session. Agentic completions where AI auto-replies or marks tasks done inflate engagement signals while suppressing human sessions. If the engagement spike tracks the rollout of a digest or agent feature, this is the hypothesis. It requires a bot-versus-human traffic audit to confirm.
4. Tracking artifact. A new SDK version, a new surface (watch, lock screen widget), or a change in how opens are counted can inflate the numerator. Segment by platform and notification type. If the spike is isolated to one surface or build version, this is a measurement problem.
5. Reflexive clicks with no follow-through (spam hypothesis). Only after ruling out the above: users are clicking out of habit, finding no value, and not browsing. This shows up as high CTR with low second-page depth and rising opt-out rates. It is the intuitive answer and the most commonly over-indexed one.
Structure a strong answer
strong
"First I want to clarify: are notifications up in send volume, open rate, or CTR? And is time-on-site per session or daily? Those definitions change the mechanism. Assuming CTR is up and per-session time is flat, I would frame the core issue: CTR is a leading indicator, session depth is lagging. These can diverge for reasons that range from 'the product is working perfectly' to 'we have a churn problem.' I want to rule them out in order of likelihood. Most likely: notifications are driving quick-task completions, users are satisfied, and time-on-site is the wrong metric to judge success. I would check task completion rate and D7 retention before treating this as a problem at all. If those are healthy, I would recommend updating the north star rather than changing notification behavior. If retention is declining, I move to the spam hypothesis and look at opt-out rates and second-page depth. One 2026-specific hypothesis worth checking: if a notification digest or AI agent feature shipped recently, it may be satisfying intent without triggering a human session. That would inflate engagement signals while suppressing time-on-site, and would actually be a sign the product is getting more efficient, not less valuable."
weak
"Notifications might be too frequent and users are getting fatigued. We should reduce send volume and A/B test." This fails on three counts: it assumes the pattern is bad before checking whether time-on-site is the right metric, it skips straight to a solution without building a hypothesis tree, and it misses the leading/lagging insight entirely. Interviewers flag this as a junior signal because it treats a metric divergence as a single-cause problem with an obvious fix.
Prioritize by evidence cost
| Hypothesis | Likelihood | Evidence required |
|---|---|---|
| Quick-task completion | High | Funnel query: entry point to exit |
| Deep-link substitution | High | Session depth by entry point |
| AI/agent completions | Medium | Bot-traffic audit |
| Tracking artifact | Medium | Platform-segmented event counts |
| Spam and fatigue | Low | Retention cohort + opt-out trend |
Naming this prioritization out loud, rather than listing hypotheses in no order, is what separates a structured answer from a brain dump.
The senior move: name when flat is fine
The highest-leverage point in this answer is one most candidates skip. Flat time-on-site is not alarming if task completion is high and retention is stable. In a world where AI agents act on notifications on a user’s behalf and digest summaries resolve intent without a session, your north star may simply need updating. The real question is whether time-on-site still maps to the value users pay for. If it does not, the finding here is a metric design flaw, not a product failure. Name that possibility explicitly, then propose the guardrail metrics (task completion rate, return rate, D7 retention) that distinguish a healthy efficiency gain from a slow churn signal.