ICE scoring framework

ICE is a prioritization scoring method built for speed: ICE Score = Impact × Confidence × Ease, each dimension rated 1 to 10. Sean Ellis coined it alongside “growth hacking” for GrowthHackers.com as a way for high-velocity experimentation teams to rank multiple tests per week. That origin is load-bearing. Ellis described ICE as a tool to “score experiments.” The word experiment matters: ICE was calibrated for reversible, fast-cycle tests, not multi-quarter platform investments or technical debt decisions where Ease will systematically bias toward tactical quick wins over strategic bets.

The formula and what each dimension actually means

Impact is expected lift on the specific metric you named as your north star before scoring anything. “Impact in general” is not a number. Impact on 30-day retention, or on checkout conversion, or on weekly active users is a number. Without a stated goal, every Impact score is arbitrary.

Confidence is about evidence quality, not feelings. Most candidates treat it as “how sure am I,” which produces gut feelings wearing a number. Calibrate it instead:

A/B data from a comparable prior change: 8 to 9
User research with clear directional signal: 6 to 7
Qualitative interviews, no prior quantitative data: 4 to 5
Gut feel or analogue from an adjacent market: 2 to 3
No data, no analogues, pure hypothesis: 1

Confidence is the most powerful lever to stress-test in an interview because it is the one dimension candidates reliably underspecify.

Ease is implementation lightness. Lower effort means higher score. A copy change might score a 9; a feature requiring two engineering weeks of integration work might score a 3. Ease does not measure business value; it measures how fast you can run the test.

A worked example

Goal: increase 30-day retention by 5 percentage points for a B2B analytics tool.

Feature	Impact	Confidence	Ease	ICE Score
Onboarding checklist	7	6	8	336
Weekly digest email	6	7	7	294
In-app contextual tips	5	8	6	240
In-app messaging channel	9	7	2	126

In-app messaging has the highest Impact and solid Confidence but scores last because Ease drags the product down. A 9 × 7 × 2 = 126 loses to a 7 × 6 × 8 = 336. That counterintuitive result is the single most useful teaching moment in a prioritization interview. If in-app messaging is the strategically correct bet, the raw ICE score is misleading, and saying so out loud is what separates a strong answer from a recitation. Onboarding checklist wins on Ease; if the team’s situation makes speed critical (early stage, high churn, limited runway), that ranking holds. If the team has capacity and a retention problem with real depth, you might override the score and say so explicitly.

How to use ICE in a live interview

strong

"Before I score anything, I want to anchor the scoring to a goal. Our north star is 30-day retention, so Impact means expected lift in 30-day retention for the affected cohort. Once I have that anchor, I'll note that reach across these features is roughly comparable since they all target the same activated-user segment, which means ICE works here. If reach varied significantly, I'd switch to RICE. For Confidence, I'm not using gut feel. The onboarding checklist has a confidence of 6 because we have qualitative research showing new users feel lost in week one, but no prior A/B data. The digest email gets a 7 because we have open-rate data from a pilot. I'll score Ease honestly, and I'll flag upfront that in 2026, with AI-assisted development, Ease has inflated across the board: most things score an 8 or 9. So I'm treating Ease as a tiebreaker, not an equal weight. The real signal lives in Impact and Confidence. Running the numbers, the onboarding checklist comes out first at 336. My recommendation is to ship that, and in parallel design a quick experiment to raise confidence on in-app messaging before committing the full build, because its impact ceiling is the highest of the group."

weak

"I'd use ICE. The onboarding checklist has an Impact of 7, Confidence of 6, Ease of 8, so ICE is 336. The digest email is 294. I'd build the checklist first." This recites the formula and assigns numbers without defending them. The interviewer's next question will be: what does a 7 for Impact mean? Why is Confidence a 6? Did you consider that reach differs across these features? The candidate has no answer because they treated ICE as a calculator rather than a reasoning scaffold. Scores without a north star are laundered intuition.

ICE vs RICE vs MoSCoW vs value vs. effort

ICE has no Reach dimension. A high-Impact change on a page with 0.1% of your traffic scores identically to the same change on your highest-traffic page. RICE fixes this by adding Reach and dividing by Effort rather than multiplying by Ease. The decision rule: if reach varies significantly across the features you are comparing, use RICE. If reach is roughly comparable or you do not have reach data yet, ICE is faster and accurate enough.

MoSCoW is a categorization method, not a scoring method. It tells you which bucket a feature belongs to (Must/Should/Could/Won’t) but says nothing about order within each bucket. ICE fills that gap inside the “Must” category when you need to sequence.

Value vs. Effort is a 2x2, not a scored stack rank. It is useful for stakeholder alignment conversations and for quickly spotting obvious quick wins. Use it when you need to communicate options visually; use ICE when you need an actual ranked list.

The fatal mistake: scoring without a north star

If you assign Impact scores before naming what Impact means in your context, the scores are arbitrary. Interviewers recognize this pattern immediately. Setting the goal first takes fifteen seconds in an interview and changes the entire register of the answer from “I know a formula” to “I know how to prioritize.”

A subtler version of the same mistake: anchoring to the wrong goal. A team optimizing for retention that scores features on engagement will build the wrong things and produce an ICE ranking that looks rigorous but is misaligned with the business objective.

ICE theater: the failure mode worth naming

ICE theater is when a team runs the scoring exercise, produces a ranked list, and then overrides it based on executive preference or whoever argues loudest. The scores provide a veneer of rigor without any actual rigor. If you have witnessed this, naming it in an interview and explaining what you did about it signals maturity. The fix is not a better framework. It is making the scoring criteria and the evidence behind each Confidence rating explicit before the session, so any override becomes visible and requires a public justification rather than a quiet veto.

ICE in 2026: Ease has inflated

AI-assisted development has compressed implementation timelines dramatically. A feature that took three sprints in 2022 may take three days now. This means Ease scores have systematically inflated: most software features that once scored a 3 or 4 on Ease now score an 8 or 9. When everything is easy, Ease stops differentiating.

The practical adjustment: treat Ease as a tiebreaker rather than a full co-equal dimension. Pour the scoring energy into Impact (is this solving a problem people and companies will pay for, and is the market large enough to sustain continued investment?) and Confidence (what evidence do we actually have that users will adopt and value this?). In 2026, ICE is closer to a Viable × Lovable × Feasibility check, where feasibility has a very high floor for most software features. Confidence now carries the discriminating weight that Ease used to carry. Teams that fail to recalibrate are running a 2019 framework on a 2026 backlog, and the scores will systematically favor whatever is easy to build over whatever is worth building.

Use it, do not recite it

In a live prioritization question, skip the definition. Name the goal, name the metric, score two or three features with a real reason for each dimension rating, note the Reach blind spot and whether it applies, flag that Ease is a tiebreaker in 2026, and close by naming your top recommendation while acknowledging you would sanity-check it against strategic dependencies before committing. That arc, run in under three minutes, is what clears the bar.