How would you measure success for Uber Eats?

Q: How would you measure success for Uber Eats?

The three-sided marketplace metric question. Name a north star, defend it against alternatives, and map the decision vs. diagnostic split interviewers test for.

Most candidates answer this question with a single-sided metric (completed orders, GMV) and then move on. That’s the tell. The interviewer is checking whether you know that Uber Eats has three principals with different utility functions, and that a metric which looks healthy on one side can be actively destroying value on another. Name a north star, defend it against the obvious alternatives, and map the diagnostic tree. That’s the answer.

The north star: Gross Bookings per Active Courier Hour

The right north star for Uber Eats is Gross Bookings per Active Courier Hour. It’s a ratio, not a count, and that matters.

When this number rises, consumers are spending more per delivery slot, restaurants are earning more per order routed to them, and couriers are being used efficiently. When it falls, you know supply is idle or demand is thin, and you can branch from there. A raw count like “completed orders” cannot tell you that distinction. A $10 order and a $60 order look identical; an order placed when couriers are sitting idle looks identical to one placed at perfect utilization.

Gross Bookings (the total value of orders placed) is also Uber’s public financial headline for the Eats segment. By anchoring the north star to Gross Bookings per unit of courier supply, you connect the PM’s operational lever directly to what the company reports and optimizes for, while encoding supply sustainability in the denominator.

The reason not to use pure eater conversion rate: Uber’s own recommendation system engineering work documented that optimizing for eater conversion caused new and unfamiliar restaurants to be systematically starved of order volume. The algorithm over-indexed on established performers. Conversion rate is a real decision metric, but it cannot be the north star because it doesn’t capture restaurant viability or courier efficiency.

Decision and diagnostic metrics, by side

Interviewers at Uber specifically distinguish decision metrics (which trigger action) from diagnostic metrics (which explain movement after the fact). Walking both for each side shows that distinction clearly.

Consumer

Decision: Weekly Reorder Rate. If this drops below threshold, something in the experience broke.
Diagnostic: First-to-second order time. A short gap tells you the first experience created pull; a long gap tells you the first experience was completed but not compelling.

Restaurant-partner

Decision: Gross Bookings per Active Restaurant. Restaurants below a minimum threshold churn off the platform.
Diagnostic: Impression-to-order conversion rate. Tells you whether the problem is discovery (restaurant isn’t being surfaced) or menu quality (it’s being surfaced but not converting).

Delivery-partner

Decision: Deliveries per Online Hour (utilization). Couriers below threshold leave for competitors or other gig work.
Diagnostic: Earnings per Hour vs. local wage floor. The relevant comparison is what a courier earns elsewhere, not an internal average.

The 2026 context

Delivery time has been commoditized. AI routing and demand forecasting are solved problems for any operator at Uber’s scale. The differentiation lever has shifted to two things: restaurant catalog quality (can you find something worth ordering?) and the reorder reflex (did the experience make coming back feel obvious?).

This is why the diagnostic metric I’d watch most closely under the north star is week-2 reorder rate. A completed order is a signal of viability. A second order is the signal that the experience was lovable, not just functional. A courier who stays on platform at high utilization is the supply signal that makes both of those things possible.

The PM’s job in 2026 is not to optimize delivery speed. It’s to manage joint optimization across three principals so that each side finds enough value to keep showing up.

strong

"My north star would be Gross Bookings per Active Courier Hour. It's a ratio that encodes consumer spend, restaurant revenue, and courier productivity in a single number. When it rises, all three sides are winning. When it falls, the diagnostic branches: is demand thin, or is supply idle? I would not use eater conversion rate as the north star because Uber's own recommendation system work shows that optimizing conversion alone causes new restaurants to be starved of volume. Conversion is a real decision metric for the consumer side, but it cannot carry the weight of the full marketplace. Under the north star, I'd track one decision metric and one diagnostic for each side: Weekly Reorder Rate plus first-to-second order time for consumers; Gross Bookings per Active Restaurant plus impression-to-order conversion for restaurant-partners; and Deliveries per Online Hour plus Earnings per Hour vs. local wage floor for couriers. In 2026, delivery speed is table stakes. The differentiation is catalog quality and the reorder reflex, so the diagnostic I'd watch most closely is week-2 reorder rate."

weak

"The most important metric is completed orders per week, because it shows the platform is growing and users are engaged." This is the modal answer. It treats a $10 McDonald's order and a $60 sushi order as equivalent. It doesn't encode courier efficiency, restaurant growth, or whether the consumer will return. It's an activity metric, not a value metric. Interviewers at Uber hear this from most candidates; it shows surface-level marketplace awareness but not the PM judgment to choose a metric that reveals the health of all three sides at once.

Why this question is hard

The structural difficulty is joint optimization. Uber’s recommendation system uses quadratic programming to simultaneously balance eater conversion, restaurant gross bookings, and marketplace fairness. No single metric is truly standalone. A PM who proposes a single-side metric as a north star is implicitly saying the other two sides don’t matter, which is wrong and which an Uber interviewer will probe immediately.

Name the tradeoff explicitly. Show you understand why the conflict exists. Then name the metric that best captures all three sides and defend it. That’s what clears the bar.

How would you measure success for Uber Eats?

The north star: Gross Bookings per Active Courier Hour

Decision and diagnostic metrics, by side

The 2026 context

Why this question is hard

Framework

Asked at

Related