product sense · hard
Technology lets humans understand animals: what would you build?
Technology now lets humans understand animals. What would you build?
This question is a viability trap. The technology is granted by the prompt, so you get no credit for saying it is possible. The entire interview falls on two questions: who is structurally willing to pay, and is the product actually useful to that person in the moment they need it? Candidates who jump to a consumer pet app expose both gaps at once.
The question is confirmed as a real OpenAI PM interview question. It also appears in mainstream press right now: CNN ran a “Dolittle machine” feature in June 2026, which means interviewers at frontier labs are primed for it. Earth Species Project’s NatureLM-audio (the first large-scale audio-language model for animal vocalizations, $17M raised as of 2025) and Project CETI’s discovery of 8,719 distinct sperm whale codas are live research context. If you are interviewing at OpenAI or Anthropic and do not know these projects exist, you are signaling that you treat this as a thought experiment rather than a product question.
Clarify the tech constraint first
This is the move that separates strong from weak. Before picking a segment, ask:
- Emotional state or semantic meaning? Emotional-state translation (calm, distressed, hungry) and semantic decoding (structured meaning, like a prairie dog alarm encoding the size, shape, speed, and clothing color of an approaching predator) are wildly different products. The clarifying question is the hinge of the whole answer.
- Which species? NatureLM-audio identifies species, sex, and life stage from vocalizations. Project CETI has mapped sperm whale codas to the level of discovering rubato and ornamentation as additional communication dimensions beyond rhythm and tempo. The science is real but species-specific.
- Real-time or asynchronous? A livestock farm monitoring system and a vet exam-room decision aid have very different latency requirements.
- What form factor? Ambient mic, wearable on the animal, or existing hardware (e.g., an exam-room camera)?
Set your premise explicitly before moving on: “I’ll assume we can translate structured vocalizations from a defined species set in near-real-time with consumer-grade audio hardware, consistent with where NatureLM-audio and Project CETI are heading.”
Skip the consumer pet app as the primary segment
State this directly and explain why. Pet owner JTBD is diffuse (“I want to know if my dog loves me”), willingness-to-pay is emotional and low, churn is high once novelty fades, and the value delivered is hard to measure. It is not uninteresting as a later market, but leading with it tells the interviewer you have not interrogated who actually pays.
Choose veterinary diagnostics or livestock agriculture
Both have a named payer, a price point, and a measurable ROI.
Veterinary clinics: the clinic pays, not the pet owner. An unnecessary imaging or bloodwork workup costs $300 to $800 per incident. A clinical decision-support layer that reduces unnecessary workups by 25% pays for itself. The ROI is calculable before you write a line of code.
Livestock agriculture: a dairy farm with 2,000 cows loses roughly $200 to $400 per sick cow per day of delayed diagnosis. B2B market, existing tech-adoption infrastructure (IDEXX, SCR by Allflex), and clear willingness to pay for outcomes that affect margin directly.
Name the MVP precisely
A standalone app is the wrong form. The strong answer is a clinical decision-support layer integrated into the practice management system the vet already uses (IDEXX, Covetrus) that flags distress signals correlated with presenting symptoms at the intake assessment. Meet vets where they already work.
Scope ruthlessly: one species (dogs, the largest veterinary population with the most behavioral data), one signal type (vocalization and posture via exam-room camera), one workflow (intake assessment). Expand by species and signal type in later iterations.
Structure a strong answer
strong
"Before picking a segment I'd clarify what 'understand' means here: emotional-state translation or semantic decoding? Those produce completely different products. I'll assume near-real-time structured vocalization translation for a defined species set, consistent with where Earth Species Project's NatureLM-audio is heading. With that premise, I'm going to skip consumer pet apps as the primary segment. Willingness to pay is low and emotional, churn is high, and the value delivered is hard to measure. Instead I'd target veterinary clinics, specifically the intake assessment workflow for dogs. The clinic is the payer, not the pet owner. An unnecessary imaging or bloodwork workup costs $300 to $800. A tool that cuts unnecessary workups by 25% pays for itself in the first quarter. The MVP is a clinical decision-support layer inside the practice management system the vet already uses, not a new app. It surfaces distress signals correlated with presenting symptoms at intake. One species, one signal type, one workflow. My success metrics: 25% reduction in time-to-diagnosis for ambiguous presentations at 50 pilot clinics, vet weekly active use above 70% at 90 days, and net revenue retention above 120% as clinics add species modules. I will not cite 500 clinics in month twelve because that is a vanity anchor, not a business signal. The lovable version surfaces only the signal that changes the diagnostic path; it is silent when silence is the right answer. A vet who sees an alert for every animal sound has an unusable product even if the AI is perfect."
weak
"I'd build a pet translator app so owners can know what their dog is saying. It could also work for wildlife conservation, or maybe a farm monitoring system." This answer hits every failure mode at once: no clarifying question, no payer identified, no ROI, and scope that keeps accumulating rather than cutting. "People love their pets" is not a willingness-to-pay argument. Conservation is high-meaning and low-revenue; NGOs and research nonprofits cannot sustain a product business. Listing every possible segment without prioritizing signals that you have no prioritization instinct. Interviewers at frontier AI companies probe the viable case on the first follow-up, and this answer falls apart immediately.
The ethical surface is a viability question
If the product can decode that a factory-farmed pig is in chronic distress, what are your obligations? This is not ethics theater; it is a market sizing and go-to-market question. The livestock agriculture segment runs directly into this tension: your customer is also the source of the welfare problem the product surfaces. A strong candidate names this explicitly and makes a deliberate choice. Veterinary diagnostics sidesteps it entirely. Livestock wellness monitoring runs into it head-on and requires a different go-to-market (animal welfare certification bodies, insurance underwriters, not the farm operator). The candidate who ignores this has not thought seriously about the product.
The 2026 PM lens
Feasibility is free in 2026. The prompt grants it. The entire interview is testing whether you can find a segment where someone is structurally willing to pay, articulate the unit economics, and describe a product that is lovable rather than merely usable: one that surfaces the right signal at the right moment and is silent otherwise. Proving viability before you scope the MVP is the skill the interviewer is watching for. The moonshot premise is a red herring if your answer cannot name a payer, a price, and a love metric.