system design · hard
"Design YouTube" (system design PM interview)
Design YouTube.
This question is not asking you to be a distributed systems engineer. It is testing whether you can reason about infrastructure choices in product terms: what does the upload pipeline mean for creator trust, what does a CDN caching policy mean for new-creator distribution, and how do you define “the system is working”? A PM who lists S3, Kafka, and transcoding queues without connecting those components to user experience is giving an engineering answer to a PM question.
Scope before designing
The numbers are scoping inputs, not trivia to recite. YouTube has 500 hours of video uploaded every minute, 2.7 billion monthly logged-in users, and 1 billion hours watched daily. The organizing constraint is a 100:1 read-to-write ratio: the system is overwhelmingly a delivery problem, not an upload problem. Name that ratio and every subsequent architectural choice becomes defensible.
Then name the users and the tension. Viewers want a relevant, low-latency stream of content. Creators want their videos to reach an audience worth the effort of publishing. These two goals are not always aligned. Optimizing the recommendation system purely for viewer session length can actively suppress new-creator distribution, because new creators have sparse signal and the ranker has no basis to promote their videos. State this tension before drawing any boxes.
Your north star metric: weekly active creators who published at least one video in the last 30 days. Creator supply health drives viewer supply health. A platform that optimizes only for viewer metrics eventually runs out of content worth watching.
Three decision zones
Upload and processing. Pre-signed upload URLs send video bytes directly to object storage, bypassing app servers. Name this as a reliability decision, not a clever implementation detail: if the upload goes through your app server, every app server failure is a creator-facing failure. The transcoding pipeline is async and takes 10 to 30 minutes by design. That delay is a deliberate cost-vs-availability trade-off: processing is GPU-intensive, synchronous transcoding would bottleneck the entire pipeline, and viewers can wait a few minutes for HD. The PM’s job is to define the SLA: p99 transcoding time under 60 minutes, with a visible “processing” state in creator studio. If a creator’s video sits in processing for two hours, they don’t know if they made a mistake or if the platform is broken. That ambiguity destroys trust faster than the latency does.
Delivery. CDN is not a checkbox. Video traffic follows a long-tail distribution: roughly 20% of content drives 80% of CDN egress. A flat caching policy treats a 10-million-view video and a 12-view video identically at the edge, which is expensive and inefficient. The PM decision is a tiered caching policy: hot tier for the top 5% of content by 7-day view velocity, warm tier for anything else published in the last 30 days, cold origin pull for the long tail. Raising this as a cost-optimization decision with a real trade-off structure (cache hit rate vs. edge storage cost vs. cold-start latency for new creators) is what separates a PM answer from an engineering answer. The cold-start penalty is also a creator-equity issue: a new creator’s first video sits in cold storage and has higher latency on first views, which can suppress engagement before the recommendation system has even evaluated it.
Recommendation. YouTube runs multiple separate ranking models by surface: Browse (homepage), Suggested (sidebar), Shorts, Search, and Notifications each have different signal weights and different optimization targets. A PM answer that names this is differentiated. For Browse, the dominant signals as of 2025 are session continuation and satisfaction surveys, ahead of raw average view duration. A 3-minute video watched to 100% completion with a like now outranks a 20-minute video watched 40% through. This is a deliberate product decision, not just an ML configuration: YouTube chose to de-prioritize raw watch time because it was promoting content that held attention through anxiety rather than satisfaction.
The architecture behind candidate generation and ranking is a two-stage pipeline: a lightweight retrieval model generates roughly 100,000 candidates using approximate nearest neighbor search over user and video embedding vectors, and a heavier transformer-based ranker scores those candidates on predicted watch time, completion rate, CTR, and return visit probability. A PM does not need to implement this, but needs to know it well enough to ask the right questions: What signals are we weighting? How do we measure whether the ranker is serving creators equitably, not just optimizing for top-10% content?
Cold start for new creators: extract multimodal features from thumbnail and auto-transcript, seed the video into a controlled exposure cohort matched to the video’s embedding cluster, and score initial engagement-per-view before broad recommendation. The PM’s SLA: within 24 hours, a new video must reach at least 200 users whose interest profile matches its topic cluster. If it hasn’t, the creator pipeline has failed, not the creator.
Structure a strong answer
strong
"Before I design anything: are we scoping for a specific surface or the full platform? And are we most constrained by the viewer at scale or the creator pipeline? [Interviewer confirms: full platform, both users matter.] My north star is weekly active creators who published in the last 30 days. Creator health is the leading indicator of viewer supply. The organizing constraint is a 100:1 read-to-write ratio, which tells me delivery is the hard problem, not upload. Three decision zones. Upload: pre-signed URLs send bytes directly to object storage, removing app server as a single point of failure. Transcoding is async at 10 to 30 minutes; I'd set a p99 SLA of 60 minutes and surface a visible processing state so creators know the system received their video. Delivery: CDN is a cost decision, not a checkbox. Long-tail distribution means 20% of content drives 80% of egress. I'd propose a tiered caching policy with a hot tier for the top 5% by 7-day view velocity, warm tier for recent content, and cold origin pull for the rest. The new-creator cold-start penalty is both a latency issue and a creator-equity issue I'd instrument separately. Recommendation: YouTube runs separate models for Browse, Suggested, Shorts, Search, and Notifications. For Browse, the dominant signals since 2025 are session continuation and satisfaction surveys, not raw watch time. This is a product decision with real trade-offs: satisfaction-weighted signals help viewers but disadvantage new creators with sparse signal volume. The counterweight I'd design is a diversity constraint at the slate level: no more than three videos from the same channel, and 10 to 15% of Browse slots reserved for emerging creators with high early engagement-per-view ratios. Cold start SLA: a new video must reach 200 matched users within 24 hours. My top three metrics: weekly active creators with at least one published video, p50 time-to-first-recommendation for new videos, and viewer satisfaction score from post-session surveys. I'd instrument all three before launch and set regression alerts, not just dashboards."
weak
"I'd design YouTube with an upload service that stores videos in S3, a transcoding service that converts them to different resolutions, and a CDN that serves them globally. For recommendations I'd use a machine learning model based on watch history and clicks. The database stores video metadata and user data. I'd use microservices and make it horizontally scalable." This fails on every dimension that matters for a PM interview: no clarifying questions were asked, so scope is undefined; CDN is treated as a checkbox rather than a cost-latency trade-off with a real number attached; the creator-viewer tension is invisible; recommendation is a black box with no acknowledgment of the two-stage pipeline, surface-specific models, or what signals the PM would prioritize and why; no north star metric is proposed; and nothing about what "the system is working" looks like.
What the interviewer is probing
- PM framing over architecture recitation: can you connect each component to a user experience decision or a cost trade-off?
- The creator-viewer tension: do you see that optimizing for viewer satisfaction metrics can suppress new creator distribution, and do you have a counterweight?
- Signal fluency in 2026: treating raw watch time as the primary recommendation signal reads as 2020. Satisfaction-weighted discovery, surface-specific models, and the cold-start SLA are current.
- Honest scope ownership: “engineering owns the exact cache tier thresholds; I’d define the SLA and instrument by content age bucket to find where latency degrades” is a stronger signal than pretending to know the number.
Numbers worth knowing
250ms recommendation latency target for a real-time homepage load. 10 to 30 minutes for transcoding (100MB video to multiple bitrate profiles). Approximately 200 users in a matched cohort as the cold-start exposure floor for a new video. The two-tower retrieval model generates roughly 100,000 candidates; the heavy ranker scores and re-ranks to roughly 50 for a homepage slate.
In 2026, the upload pipeline also carries AI-native features: auto-dubbing for global reach, ML-generated thumbnail variants tested before broad rollout, and auto-chapters from transcripts. Each changes the creator-viewer trade-off in a specific way: auto-dubbing expands creator distribution to non-English audiences without extra work (viable for creators); ML thumbnails can optimize for CTR in ways that mislead viewers (a trust issue the PM must govern). A strong candidate names one of these and articulates the trade-off it introduces.
See feasibility is free for the viable/lovable frame that applies across all platform design decisions, and improve YouTube recommendations for a deeper look at the satisfaction-weighted signal shift.
Related
- "How would you improve YouTube's recommendations?" product-sense
- "Design Twitter's feed system" (system design PM interview) system-design
- "Design a news feed" (2026 version) product-sense