GIST framework

GIST is a planning architecture created by Itamar Gilad, a former Google PM, first published in a 2016 Medium post and later expanded in his book Evidence-Guided. The name stands for Goals, Ideas, Step-Projects, Tasks. Each layer has a distinct planning horizon, and the structure’s entire value is in what sits between them: a mechanism that connects annual strategy to weekly delivery without a committed feature list in the middle.

The problem it solves is specific. Traditional roadmaps require you to pick winning ideas before you have evidence that they win. Gilad’s data on this is uncomfortable: at most 1 in 3 ideas deliver a positive result. Google and Bing A/B test positivity rates run between 10 and 20 percent. Locking twelve months of engineering capacity to a feature roadmap is structurally irrational given those odds. GIST’s answer is not better upfront prediction. It is a delivery model that assumes most ideas will fail and makes failure cheap.

The four layers

Goals are set annually and reviewed quarterly. Each goal should carry two metrics: an impact metric (revenue, profit, retention) and an outcome or North Star metric that represents the behavior you are trying to shift. WhatsApp’s North Star is sent messages. Airbnb’s is nights booked. The pairing matters because an impact metric without a North Star metric can be gamed (cut support costs by reducing support channels), and a North Star metric without an impact metric can lead a team to optimize behavior that does not generate sustainable economics.

Ideas are collected continuously in an Idea Bank. Crucially, ideas are not eliminated upfront. The bank can hold hundreds of ideas indefinitely, ranked but not culled. Ranking uses ICE scoring: Impact, Confidence, Effort, each rated 1 to 10. The ICE scores are not static. After a step-project completes, results feed back to update the scores of related ideas, so the bank stays calibrated rather than going stale. The strongest candidates rise over time based on evidence, not on who argued loudest in a planning meeting.

Step-Projects are the structural innovation GIST is actually known for. Each is capped at ten weeks. Rather than committing to a full feature build, a step-project runs staged experiments: mockup, prototype, MVP, dogfood, beta, launch. Each stage is a Build-Measure-Learn cycle with an explicit go/no-go gate. The gate is a decision, not a formality. If a prototype shows no evidence of behavior change against the goal metric, you kill it and move to the next idea in the bank. The ten-week cap forces this discipline: there is no “we’ll figure out the metrics after we ship” option when every project has an expiry.

Tasks are one to two week iterations managed in whatever agile tooling the team already uses: Jira, Linear, or similar. This layer is intentionally not novel. GIST does not try to replace sprint tooling. It sits above it.

What GIST is not: the OKR distinction

Most explanations of GIST gloss over this, which is why candidates who have “read about it” often reveal that they haven’t understood the problem it solves.

OKRs cover the Goal layer and nothing below it. They tell you what you want to achieve (Objective) and how you’ll measure it (Key Results). They say nothing about which ideas to pursue, how to size experiments, or how to connect delivery iterations to the goal. The Ideas, Step-Projects, and Tasks layers are structurally absent from OKR methodology. That gap is GIST’s actual value-add.

The two frameworks are additive. OKRs handle Goals well. GIST provides the architecture for everything below. A team already running OKRs can adopt GIST by treating the Key Results as the outcome metrics inside their Goals layer and building an Idea Bank beneath them. The common mistake is treating them as competing: “we use OKRs, so we don’t need GIST.” That team likely has a committed roadmap sitting between their OKRs and their sprint board, which means they are prioritizing ideas upfront and absorbing the full cost of that 1-in-3 success rate.

Where GIST fails

GIST works poorly when Goals are written as outputs rather than outcomes. “Launch three features” or “complete platform migration” are not GIST Goals. They are milestones. A Goal that does not name a measurable change in user or business behavior gives the Idea Bank nothing to ICE against, so scores become arbitrary and the whole system degrades to a ranked to-do list.

The other failure mode is a shadow roadmap: leadership maintains a committed feature list in parallel with the official Idea Bank, and the bank becomes theater. This is common in organizations where executives are evaluated on shipping specific things rather than on outcome metrics. GIST cannot fix incentive structure by itself. If stakeholders are rewarded for feature delivery rather than goal achievement, the framework collapses into compliance.

The 2026 case for GIST

In 2026, AI development tooling has largely eliminated feasibility as a constraint. A reasonably-scoped feature can be shipped in days with agentic coding. This makes the Goal layer more important, not less. If you can build almost anything quickly, “what are we trying to achieve and for whom” is the entire strategic question.

The Idea Bank also changes character under these conditions. AI generates idea volume trivially, so the bank fills fast. ICE scoring discipline becomes the real differentiator: teams that can write a rigorous Confidence score based on actual evidence beat teams drowning in AI-generated feature lists. Confidence now does the discriminating work that Effort used to do.

The Step-Project model maps directly onto how AI features need to be evaluated. Each step-project stage resembles a product-wrapped eval: you need real user data to know whether an AI feature actually helps. A ten-week step-project with a named outcome metric (task completion rate, user return rate) is structurally identical to running a model-in-the-loop experiment with a kill gate. The staged progression gives the team permission to stop before a full build, which is critical when AI features have high variance in real-world performance.

The viability problem is where most teams using GIST in AI contexts under-invest. “Users engage with the feature” is not a GIST Goal metric. Viability means companies are willing to pay and the market is large enough to sustain continued investment. Goals must encode both the North Star metric (the behavior signal for usability and lovability) and a revenue or retention proxy. GIST done well forces this encoding. GIST done poorly lets teams run dozens of step-projects optimizing for engagement on a product that no one will pay for.

Interview framing

When asked about roadmap philosophy or planning process, the failing pattern is: “GIST is basically OKRs but more detailed.” That signals you have not understood the problem GIST solves.

strong

"Traditional roadmaps force you to pick winning ideas before you have evidence that they win. Gilad's data shows at most 1 in 3 ideas succeed, and Google and Bing A/B tests run at 10 to 20 percent positivity. Committing features twelve months out is structurally expensive given those odds. GIST addresses this by separating strategy from idea selection. Goals are annual and outcome-framed, using two metrics: an impact metric like revenue retention and a North Star metric like the specific behavior you're shifting. Ideas live in a continuously-scored bank using ICE, and importantly, the scores update after experiments complete so the bank stays calibrated. The key structural piece is the Step-Project: capped at ten weeks, run as staged experiments from mockup to MVP to beta, with a kill gate at each stage. Teams run several step-projects concurrently rather than one big feature build. Tasks are just normal sprint work inside a step. The OKR distinction matters here: OKRs cover Goals well but leave the Ideas-to-delivery architecture unspecified. GIST fills that gap, so they are additive. Where I've seen it fail is when Goals are written as outputs rather than outcomes, or when leadership runs a shadow roadmap alongside the bank. In either case, the problem is incentive structure, not the framework."

weak

"GIST stands for Goals, Ideas, Steps, Tasks. It's like OKRs but with more detail. You set goals, then you have an idea bank, and you run small experiments. It's good for avoiding roadmap politics." This names the acronym and gestures at the concept without explaining why the structure matters, what the step-project model actually does, how ICE scoring works inside it, or what distinguishes GIST's architecture from OKRs. An interviewer will follow with "how is that different from OKRs?" and the candidate has nowhere to go.

Use it, do not recite it

In a live question, skip the definition. State the problem: upfront idea selection under uncertainty is structurally irrational at industry-observed success rates. Name the architecture: Goals with dual metrics, an Idea Bank scored with ICE, Step-Projects with kill gates, Tasks in existing tooling. Place it correctly relative to OKRs: additive, not competing. Acknowledge the failure modes: output-framed Goals and shadow roadmaps. In 2026, note that the Goal layer is doing more work than ever because feasibility is no longer the binding constraint.