AI Tools for Product Owners

Defining Requirements for AI-Powered Features

Writing requirements for AI features is fundamentally different from writing requirements for traditional software. AI behavior is probabilistic, not deterministic — and that changes everything about how you define success, acceptance criteria, and human oversight.

Why AI Feature Requirements Are Different

Traditional software requirements describe deterministic behavior: "When the user clicks Submit, the system validates the form and displays an error message if validation fails." The behavior is exact and verifiable.

AI feature requirements describe probabilistic behavior: "The AI should identify potentially fraudulent invoices with high accuracy." This statement contains no acceptance criterion that a QA engineer can verify. What is "high accuracy"? High compared to what? What happens when the AI is wrong?

Product owners who define AI features using traditional requirements formats consistently fail to account for:

  • Failure modes — AI fails differently from traditional software. It fails by being confidently wrong, not by crashing. Your requirements must specify what happens when the AI is wrong.
  • Accuracy thresholds — What level of accuracy is acceptable? What's the cost of a false positive vs. a false negative?
  • Human oversight — When must a human review the AI's output before it takes effect? Who reviews it? What interface do they need?
  • Feedback loops — How does the AI improve over time? What data does it need? How do users provide corrections?
  • Explainability — Can users understand why the AI made a decision? Do they need to?

Writing AI Feature Requirements

Example: AI-powered invoice fraud detection

Instead of: "The system should flag potentially fraudulent invoices."

Write this:

text
Feature: AI-assisted invoice fraud detection

Behavior specification:
1. The system analyzes each invoice submission against historical patterns
   and rules and assigns a fraud risk score (0-100).

2. Scoring thresholds:
   - 0-39: Auto-process (no human review required)
   - 40-69: Flag for optional AP Manager review before processing
   - 70-100: Hold for mandatory AP Manager review before processing

3. When a flagged invoice is reviewed by an AP Manager:
   - Display the top 3 reasons for the flag (plain language)
   - Provide: Approve / Reject / Escalate to Finance Director options
   - Record the reviewer's decision and reason in the audit log

4. Accuracy requirements (to be validated at 60-day post-launch review):
   - True positive rate: >70% of fraudulent invoices flagged
   - False positive rate: <15% of legitimate invoices flagged

5. Explainability: Every flag must show at least one specific reason
   (e.g., "Vendor bank account changed within last 30 days",
   "Invoice amount 3x above vendor's 90-day average")

6. Model improvement: AP Manager override decisions feed back into
   model training data (every 90 days). PM and Data team responsible
   for initiating retraining reviews.

7. Fallback: If the AI service is unavailable, all invoices default
   to standard manual processing workflow (no AI dependency in the path).

This specification is verifiable. A QA engineer can test the threshold logic. A product manager can audit the accuracy at 60 days. A developer knows what to build.

The Acceptance Criteria Problem for AI

For AI features, acceptance criteria require different thinking.

Traditional acceptance criteria: "When the user submits a form with an invalid email, the system displays 'Invalid email format.'" Deterministic. Pass/fail.

AI feature acceptance criteria options:

*Process-based criteria:*

text
- The fraud detection model runs on 100% of submitted invoices
  before the invoice enters the approval queue
- All flagged invoices display at least one specific reason for the flag
- The review interface records the reviewer's decision and timestamp

*Accuracy-based criteria (tested on a validation dataset):*

text
- Precision on validation dataset: >65% (of flagged invoices,
  65% are actually fraudulent based on retrospective review)
- Recall on validation dataset: >60% (of actually fraudulent
  invoices, 60% are flagged)
- Tested on a representative sample of 1,000 historical invoices
  with known outcomes

*User experience criteria:*

text
- Fraud reasons are comprehensible to AP Managers without
  training (validated in UAT with 3 AP Managers)
- Review workflow adds no more than 90 seconds per flagged invoice
  on average (timed in UAT)

Human-in-the-Loop Design

Every AI feature needs explicit human oversight design. The questions the PO must answer:

When is AI output used directly vs. reviewed first?

  • High-confidence, low-risk decisions: AI acts, human informed
  • Lower-confidence or higher-risk decisions: human reviews before action

Who reviews?

  • Define the reviewer role, qualifications, and authority
  • Define escalation path if the reviewer disagrees with the AI or is unavailable

What does the reviewer see?

  • The AI's output
  • The reasons for the AI's conclusion
  • The information the AI used to reach its conclusion
  • The ability to override

How are overrides captured?

  • For model improvement (all overrides should be logged)
  • For audit compliance (specific industries require this)
  • For quality monitoring (too many overrides signals model degradation)
text
Define the human-in-the-loop design for the invoice fraud detection feature.

For each decision threshold (0-39, 40-69, 70-100):
1. What action does the system take automatically?
2. What notification goes to which human?
3. What does the human see when they review?
4. What actions can the human take?
5. How long does the human have to take action before a default applies?
6. How is the human's decision recorded?

Scoping AI Features Responsibly

AI features carry risk that traditional features don't. The PO must define:

Data requirements: What training data does this feature need? Does it exist? Is it clean? Is it representative of the actual use case?

Bias and fairness: Could the AI's training data contain biases that produce unfair outcomes for some users? What is the testing plan?

Privacy: Does the AI process personal data? What are the data retention and processing consent requirements?

Regulatory compliance: In regulated industries (healthcare, finance, legal), AI decision support has specific compliance requirements. The PO must understand these before the feature is scoped.

Key Takeaways

  • AI feature requirements must specify probabilistic behavior explicitly: accuracy thresholds, failure modes, false positive/negative cost, human oversight design
  • Acceptance criteria for AI features: process-based (always runs), accuracy-based (performance targets on validation data), and UX-based (comprehensible to reviewers)
  • Human-in-the-loop design is a first-class requirement, not an afterthought: define who reviews, what they see, what they can do, and how decisions are recorded
  • Feedback loops are requirements: how does the AI improve over time, and who is responsible for triggering retraining reviews?
  • Fallback behavior is a requirement: what happens when the AI service is unavailable? AI cannot be a single point of failure in a critical business process

---

Practice: Choose an AI feature you've shipped or seen in a product you use. Re-write its requirements using the framework from this lesson. What was missing from the original specification? What would have changed in the implementation if these requirements had been explicit from the start?