glossary · ai
Prompt engineering definition
Designing and iterating on the instructions given to an LLM to reliably produce outputs that meet quality, format, and cost targets.
Prompt engineering is the practice of designing and iterating on the instructions given to a large language model to reliably produce outputs that meet quality, format, and cost targets. It operates at two levels: the system prompt (developer-controlled, set once per feature) and the user prompt (shaped at runtime by user input or product logic). It is the first lever to pull when building an AI feature because it costs zero additional infrastructure and can be iterated in hours. You reach for RAG or fine-tuning only after you’ve confirmed that prompting alone cannot solve the problem.
Why this is a PM skill, not just an engineering one
Every system prompt runs on every request. A bloated 2,000-token prompt multiplied across a million daily requests is a direct line item in your LLM unit economics. The PM who writes or approves the system prompt owns that cost, whether they know it or not. The same PM also owns the failure modes: prompt brittleness (a slight change in user input format silently degrades output quality), prompt leakage (adversarial users extract system instructions via injection attacks), and output inconsistency (no structured format means downstream code breaks unpredictably). These are product concerns before they are engineering ones.
The techniques a PM must know by name
- Zero-shot: instruction only, no examples. The baseline. Works when the task is unambiguous and the model has strong priors.
- Few-shot: instruction plus 2 to 5 examples in-context. The single highest-leverage technique. Reduces ambiguity, anchors tone and format, and costs less than fine-tuning by orders of magnitude.
- Chain-of-thought: asking the model to reason step by step before giving a final answer (“think through this before responding”). Reliable quality improvement for multi-step reasoning tasks; adds tokens, so use it where accuracy matters more than latency.
- Role / persona assignment: setting a specific role in the system prompt (“you are a senior support analyst”) to anchor tone, vocabulary, and behavior. Useful but overused; a precise task description often beats a vague persona.
- Structured output / JSON mode: constraining the output format so downstream systems can parse it reliably. This is a product reliability concern: an inconsistent output format means engineer time on exception handling and user-visible errors on edge cases.
- Meta-prompting: using the model to critique and improve its own prompt. Describe what you want the output to achieve, paste the current prompt and a set of failure cases, and ask the model to rewrite the prompt to fix them. Faster than manual iteration when you have clear eval cases.
The decision sequence: prompt, then RAG, then fine-tune, then distill
The canonical order is not arbitrary. Each step adds infrastructure cost and iteration time.
Prompt first. No pipeline, no hosting cost. Iterate in an afternoon. If your knowledge base fits in the context window (roughly under 200k tokens for most models; Claude supports up to 1M, Gemini 1.5 up to 2M), full-context prompting is often cheaper and faster than building a retrieval pipeline. The break-even depends on request volume and latency tolerance.
Add RAG when you need fresh or private data the model was not trained on. RAG introduces retrieval latency (50 to 300ms per query) and ongoing infrastructure cost ($500 to $3,000 per month for a modest deployment). It is the right call when the knowledge changes frequently or when grounding in specific documents is a trust requirement.
Fine-tune only when you need stable behavioral change that prompting cannot hold: a specific output schema, a consistent refusal pattern, or a domain-specific tone that degrades whenever the prompt changes. Fine-tuning has significant upfront cost ($5,000 to $20,000 depending on model and data volume) and a slow iteration loop. It is not a quality upgrade over good prompting; it is a behavioral lock-in for cases where that lock-in is worth the cost.
Distill for cost optimization once you have a fine-tuned model that performs well. This is rarely a PM’s first three moves.
Prompt brittleness and why evals exist
A prompt that works on your test cases but breaks when user input shifts format is brittle. Brittleness is invisible without an eval harness: you will not see it in aggregate metrics until it has already cost you retention. The PM’s job is to insist on a set of regression evals before shipping any prompt-dependent feature, and to add failure cases to that set whenever production traffic surfaces a new pattern. Prompt engineering and eval discipline are inseparable.
What a strong answer sounds like in an interview
strong
"Prompt engineering is designing and iterating on the instructions given to an LLM at the system and user level to hit quality, format, and cost targets. It's the first lever I reach for because it costs nothing extra and I can iterate fast. The core techniques are zero-shot for simple tasks, few-shot when I need to anchor format or tone with examples, chain-of-thought when accuracy on multi-step reasoning matters more than latency, structured output mode to prevent downstream parse errors, and meta-prompting to use the model itself to fix a prompt against a set of failure cases. The decision sequence I follow is: prompt first, add RAG when I need fresh or private data the model wasn't trained on, fine-tune only when I need a stable behavioral change that prompting can't hold. On the cost side, every token in a system prompt runs on every request, so a bloated prompt is a unit economics problem. I also watch for prompt brittleness: if a slight format change in user input silently degrades output quality, that's a prompt coverage problem I can only catch with evals. At scale, prompt decisions and eval discipline are the same decision."
weak
"Prompt engineering is just writing better prompts, like being more specific or giving the AI more context." This treats prompting as a personal productivity skill rather than a product-layer engineering discipline. It misses the system prompt vs. user prompt distinction, says nothing about cost or reliability, and gives an interviewer nothing to probe. It cannot connect prompt choices to eval results or token economics, which are table-stakes topics at AI-native companies in 2026.
For the decision logic between prompting, RAG, and fine-tuning in detail, see RAG vs fine-tune vs prompt. For how to measure whether your prompts are working, see eval harness for PMs. For the unit economics framing, see LLM unit economics one-pager.