system design · hard

"Design Netflix" (system design PM interview)

Design Netflix's system.

Updated Jun 2026 Calibrated to the strong-hire bar

This question is not asking you to spec a distributed video platform from scratch. It is testing whether you can reason about architecture choices in product terms: why Netflix built its own CDN instead of using a third party, how the recommendation engine connects to subscriber retention, and how you define “the system is working.” A PM who names microservices, Kafka, and object storage without connecting those components to business outcomes is giving an engineering answer to a PM question.

Scope before designing

The prompt is deliberately open. “Design Netflix’s system” could mean the streaming delivery pipeline, the recommendation engine, or the full platform. Before drawing anything, ask: “Which dimension is most useful to explore: content delivery, personalization, or the full request lifecycle?” If the interviewer says your call, pick one and state the reason. A strong framing: “I’ll focus on the recommendation and delivery system because those two subsystems are where Netflix’s core retention moat lives, and the tradeoffs between them are the product-interesting ones.”

Then anchor the system to a user contract. A subscriber opens Netflix wanting to find something worth watching and start playing it with zero friction. The system’s job: (1) present a ranked, personalized catalog in under 100ms, (2) begin streaming the selected title in under 2 seconds, (3) maintain playback quality across variable network conditions without a rebuffer. Every architectural decision should trace back to at least one of those three.

The two planes and why they are separated

Netflix runs on two deliberately decoupled infrastructures. The control plane lives on AWS and handles authentication, catalog metadata, recommendation serving, and the ISP handoff decision. The data plane is Open Connect, Netflix’s proprietary CDN, with appliances physically inside ISP networks at over 1,000 locations worldwide. They do not communicate directly by design: if the recommendation service has an outage, streaming continues.

The PM framing on Open Connect is the one most candidates miss. Netflix did not build its own CDN because it is technically elegant. It built Open Connect to own the reliability variable most directly correlated with subscriber churn. A rebuffer event on Akamai or CloudFront is opaque and unactionable for Netflix engineers. A rebuffer event on Open Connect is fully instrumented, rootcauseable, and improvable. Vertical integration of the CDN is a product decision with financial consequences: ISP partnership complexity and hardware depreciation are the cost; churn reduction and contribution margin per stream are the return. A pre-scale streaming startup should use CloudFront or Fastly and revisit this decision around 50 million MAU, when the unit economics shift.

Open Connect has two appliance tiers. Storage Appliances at internet exchange points hold nearly the full catalog and serve as the origin for edge nodes. Edge Appliances inside ISP networks cache regionally popular content and serve the majority of actual stream requests. Content is pushed to edge nodes during off-peak hours through a proactive prefetch model, not pulled on demand. This matters for a PM because it means Netflix can guarantee low-latency starts for popular titles without waiting for the first viewer in a region to trigger a cache miss.

The recommendation engine as a retention mechanism

The recommendation engine drives roughly 80% of what subscribers watch on Netflix. It is not a ranking algorithm. It is the primary mechanism by which Netflix converts content spend into retention. Framing it as infrastructure understates its business centrality.

Until 2024, Netflix ran siloed models per surface: the home page row ranker, the search ranker, and the artwork selection model each operated independently. In 2025, Netflix moved to a unified multi-task AI foundation model that learns shared user preferences across all surfaces simultaneously. The product consequence: a user who searches for “spy thriller” updates the same preference embedding that shapes their home page row ranking. The system is no longer fragmented by surface.

For a cold-start subscriber (new account, no play history), the system falls back to popularity signals by genre, demographic cohort, and device type. Implicit signals accumulate quickly: play, pause, skip, and completion rate are all inputs, and the model updates within a session. A strong PM answer names the cold-start problem explicitly and proposes a diversity constraint to prevent filter-bubble formation in the first week of a new account.

The 2026-relevant addition is generative AI semantic search. A subscriber can now query “fun sci-fi with mystery vibes” and get semantically matched results rather than keyword-matched ones. This is a genuine usability improvement: it closes the gap between what a user wants and what they can articulate in a search box. It is also backed by an Entertainment Knowledge Graph that clusters talent, genre signals, and thematic similarity at a level that structured metadata cannot reach.

The encoding tradeoff as a margin decision

AV1 codec combined with AI film grain synthesis reduced Netflix’s bandwidth per stream by 20 to 30% in 2025. For a PM this is not a codec decision. It is a contribution margin and accessibility decision. Better quality at lower bitrate means Netflix can serve subscribers on congested or mobile networks in emerging markets without downgrading the experience. Retention in lower-bandwidth markets improves. The cost: AV1 encoding is computationally more expensive than H.264, so the encoding pipeline requires more GPU time. The payoff breaks even around 8 to 12 months of reduced egress costs.

Structure a strong answer

strong

"Before I design anything: is the focus delivery, recommendations, or the full platform? [Interviewer: your call.] I'll focus on recommendation plus delivery because those two systems are where Netflix's retention moat lives and the tradeoffs between them are product-interesting. The user contract I'm designing to: personalized catalog in under 100ms, playback started in under 2 seconds, no rebuffer. Two planes. Control plane on AWS handles auth, catalog, and the ISP handoff decision. Data plane is Open Connect, Netflix's proprietary CDN, sitting physically inside ISP networks. They're decoupled by design so a recommendation outage doesn't take down streaming. The PM insight on Open Connect: Netflix didn't build it for technical elegance. They built it to own the reliability variable most correlated with churn. A rebuffer on Akamai is opaque; a rebuffer on Open Connect is instrumented and fixable. That's a product decision with a financial return. A startup should use CloudFront and revisit around 50M MAU when unit economics shift. On recommendation: until 2024, siloed models per surface. In 2025, unified multi-task foundation model across home page, search, and artwork. A user searching for 'spy thriller' now updates the same embedding that shapes their home page. Cold start falls back to popularity by genre and device; I'd add a diversity constraint in the first-week ranking to avoid filter-bubble formation before we have real signal. New in 2026: generative AI semantic search and an Entertainment Knowledge Graph. Both close the gap between what a user wants and what they can say out loud. On encoding: AV1 plus AI film grain synthesis cut bandwidth 20 to 30% in 2025. That's a contribution margin decision, especially in emerging markets where average bandwidth is lower. My north star metrics: watch hours per subscriber (retention proxy). Leading indicators: time-to-play p99 under 2 seconds, rebuffer rate under 0.1% of sessions, recommendation-driven play rate. Each maps directly to a system decision."

weak

"You'd need a CDN, a database, an API layer, a recommendation engine, maybe microservices, and a search service." The candidate draws a generic three-tier architecture, mentions that Netflix uses AWS, and describes the recommendation engine as "collaborative filtering plus content-based filtering." When asked why Netflix built its own CDN instead of using Akamai, the candidate says "for better performance" without any business reasoning. When asked how they'd measure success: "watch time and subscriber count," with no connection between any system decision and a metric. This fails because it demonstrates memorized vocabulary rather than reasoned design. Netflix interviewers probe in real time: "Why not just use Akamai?" and "What breaks first at 300 million concurrent streams?" A list-level answer has nowhere to go. The candidate has not shown they understand how system architecture decisions connect to product strategy or financial outcomes, which is the entire PM-specific bar.

What the interviewer is probing

  • PM framing over architecture recitation: every component should connect to a user experience outcome or a cost tradeoff, not just a box in a diagram.
  • The Open Connect business rationale: “for better performance” is the shallow answer. The real answer is vertical integration to own the reliability variable, with churn and contribution margin as the return.
  • Recommendation as retention mechanism: knowing the 80% discovery figure and the 2025 shift to a unified foundation model signals current, calibrated knowledge.
  • 2026 architectural awareness: generative AI semantic search, the Entertainment Knowledge Graph, and AV1 film grain synthesis are all interview-relevant as of now. Candidates who treat the recommendation engine as a 2019-era CF model read as out of date.
  • Honest scope ownership: “engineering owns the exact cache tier thresholds; I’d define the latency SLA and instrument by region to find where the p99 degrades” is stronger than inventing a number.

Numbers worth knowing

2 seconds: time-to-play target for p50. Under 0.1% of sessions: rebuffer rate target. Over 1,000 ISP locations globally for Open Connect. 80%: share of content discovered through the recommendation engine. 20 to 30%: bandwidth reduction from AV1 plus film grain synthesis. 300 million: approximate subscriber base at time of this writing, which sets the concurrency floor for any reliability reasoning.

In 2026, the interesting design question is not “how do you serve video at scale” (feasibility is largely solved). It is “how does the system surface the right title to a subscriber who has never articulated their taste in words, before they feel any friction at all.” That is a viable and lovable problem: viable because better personalization reduces content acquisition cost per retained subscriber, lovable because a cold-start experience that surfaces something genuinely right in the first session is what converts a trial to a committed subscriber.

See feasibility is free for the viable/lovable frame that applies across all platform design decisions, and measure success: Netflix top metrics for the metric tree this system is built to move.

Asked at