IaaS Platform Guide
Intelligence-as-a-Service by IRONCODE
1. Introduction & Architecture
Purpose
IaaS (Intelligence-as-a-Service) is a platform where AI agents conduct structured qualitative research interviews via chat and voice, then analyze the results using Mayring's Qualitative Content Analysis (QCA). The core differentiator: every finding traces back to its source — zero hallucinated insights, 100% traceability.
Hypotheses Under Test
| ID | Hypothesis | Success Criteria |
|---|---|---|
| H1/H2 | AI agent can conduct structured interviews via chat AND voice | ≥ 80% question coverage |
| H3 | Agents are configurable (prompt, questions, model, temperature) and produce measurably different behavior | Measurable difference across configs |
| H4/H5 | Mayring QCA analysis with 100% traceability | Every finding → coding_unit → message → interview |
Tech Stack
| Layer | Technology |
|---|---|
| Backend | FastAPI (Python 3.12), async |
| Frontend | Next.js 14 + TypeScript |
| Database | PostgreSQL 16, Alembic migrations |
| LLM | Claude API (claude-sonnet-4-6 default, configurable per agent) |
| Speech-to-Text | Deepgram Nova-2 (streaming via WebSocket) |
| Text-to-Speech | ElevenLabs (eleven_turbo_v2_5, streaming) |
| Voice Transport | WebSocket (not WebRTC for PoC) |
| Deploy | Docker Compose (local), Azure Container Apps (production) |
Architecture
Three containers compose the system:
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ iaas-frontend │ │ iaas-backend │ │ iaas-db │
│ Next.js :3000 │──▶│ FastAPI :8000 │──▶│ PostgreSQL :5432│
└──────────────────┘ └──────────────────┘ └──────────────────┘
│
┌─────────┼─────────┐
▼ ▼ ▼
Claude API Deepgram ElevenLabs
(LLM) (STT) (TTS)
The Traceability Chain (Core Principle)
Every analytical finding links back through an unbroken evidence chain:
AnalysisFinding
└─▶ Category (AI-grouped theme)
└─▶ CodingUnit (verbatim extract + paraphrase + generalization)
└─▶ Message (original participant response)
└─▶ Interview (session with participant)
└─▶ Agent (configured interviewer)
No finding exists without traceable evidence. No quote is fabricated.
Database Tables (10 total)
| Table | Purpose |
|---|---|
personas | Reusable personality profiles |
agents | Executable AI interviewers with LLM config |
question_guides | Ordered interview questions per agent |
session_templates | Multi-agent session blueprints |
session_template_agents | Agent–session mapping with roles |
interviews | Individual interview instances |
messages | All interview messages (participant + agent) |
voice_sessions | Voice-specific session metadata |
analysis_runs | QCA analysis instances with config |
coding_units | Segmented message extracts |
categories | Thematic groupings of coding units |
analysis_findings | Synthesized research findings |
interview_invitations | Shareable invitation tokens |
audit_log | State-change audit trail |
2. Persona Library
What Personas Are
Personas are reusable personality profiles that serve as templates for creating agents. Each persona encapsulates a consistent interviewer personality — name, description, system prompt, voice settings, language, and behavioral rules. Instead of configuring each agent from scratch, admins pick a persona and "instantiate" it into a ready-to-configure agent.
Persona Fields
| Field | Type | Purpose |
|---|---|---|
key | String (unique) | Slug identifier (e.g., "system_autopsy") |
name | String | Display name (e.g., "Dr. Vera Kessler") |
personality_description | Text | Rich personality profile |
system_prompt_template | Text | Base system prompt instructions |
language_default | String | Default interview language ("en", "de") |
voice_description | Text | Description of voice tone/style |
default_voice_id | String | ElevenLabs voice ID |
best_for | JSON array | Tags like ["architecture", "technical"] |
engagement_rules | JSON | Custom rules (max_follow_ups, probe_on_vague, etc.) |
Admin Workflow: Persona → Agent
- Browse Personas at
/personas— grid view with personality previews, best_for tags, and language badges. - Click "Instantiate" — opens a modal asking for welcome message and optional interview type.
- System creates an Agent pre-configured with the persona's personality, prompt, voice, and language.
- Redirects to agent edit page (
/agents/{id}/edit) where the admin fine-tunes questions, model parameters, etc.
DIAGNOSE Phase Personas
The platform ships with 5 DIAGNOSE dimension personas:
| Persona | Key | Focus | Style |
|---|---|---|---|
| Dr. Vera Kessler | system_autopsy | Technical landscape, legacy systems, vendor lock-in | Methodical, precise, analytical |
| James Okafor | capacity_audit | Time allocation, automation potential, work-about-work | Data-driven, direct, efficient |
| Dr. Miriam Steinfeld | organizational_autopsy | Decision flows, org structure, culture patterns | Empathetic, insightful, structured |
| Raj Anand | data_intelligence | Data maturity, AI readiness, information architecture | Curious, systematic, technically deep |
| Ingrid Larsen | process_zero_test | Core processes, failure modes, handoff patterns | Pragmatic, thorough, structured |
3. Agent Configuration
What Agents Are
Agents are executable interview entities — they are what actually conducts the conversation with participants. Each agent has independently configurable LLM parameters, interview behavior, voice settings, and a set of ordered questions.
Configurable Parameters
| Parameter | Default | Purpose | Impact on Behavior |
|---|---|---|---|
model_name | claude-sonnet-4-6 | Which Claude model to use | Sonnet: balanced. Opus: deeper reasoning. Haiku: faster, cheaper. |
temperature | 0.7 | LLM sampling temperature (0.0–1.0) | Lower = more predictable/focused. Higher = more creative/varied. |
max_tokens | 1024 | Response length limit | Constrains how long the agent's responses can be |
system_prompt | (from persona) | Core personality and instructions | Defines the agent's entire conversational identity |
welcome_message | (required) | First message to participant | Sets the tone for the entire interview |
closing_message | "Thank you..." | Final message | Wraps up the interview gracefully |
language | "en" | Interview language | Affects prompt modules (German/English specific rules) |
voice_id | (optional) | ElevenLabs voice ID | Which voice speaks in voice interviews |
turn_taking_profile | "balanced" | Voice conversation patience | responsive (fast) → balanced → patient → therapist (very patient) |
interview_type | (optional) | Classification tag | E.g., "architecture", "culture", "technical" |
dimension_config | (optional) | Custom tracking dimensions | Enables visual dimension tracking in the UI |
How Agent Settings Affect Behavior (H3)
The same interview questions asked by two differently configured agents produce measurably different results:
- Temperature 0.3 agent: Sticks closely to question guide, minimal creative probing, consistent responses across interviews.
- Temperature 0.9 agent: More varied follow-ups, creative analogies, sometimes surprising questions.
- Sonnet model: Balanced depth and speed, good for most interviews.
- Opus model: Deeper reasoning, better at detecting contradictions and nuance, slower.
- "therapist" turn-taking: Waits up to 8 seconds before speaking in voice, allows long pauses for reflection.
- "responsive" turn-taking: Takes over after 2 seconds of silence, keeps pace high.
Question Guides
Each agent has an ordered list of question guides — the interview curriculum.
| Field | Purpose |
|---|---|
question_text | The question to ask |
order_index | Position in sequence (0, 1, 2, ...) |
probing_depth | How many follow-up rounds (0–5). Default: 2 |
is_required | Must be asked before interview can complete |
topic_label | Dimension/topic name for tracking (e.g., "Architecture") |
Probing Depth controls how the interview engine follows up:
| Depth | Behavior |
|---|---|
| 0 | No follow-ups. Move to next question after any response. |
| 1 | One round of clarification. |
| 2 | Two rounds of probing (default). Good for most questions. |
| 3–5 | Deep exploration. Agent probes extensively before advancing. |
Topic Labels connect questions to the agent's dimension_config for visual tracking. When a question has topic_label: "Architecture", the UI shows progress on the "Architecture" dimension as that topic is covered.
Admin Workflow: Configure Agent
- Navigate to
/agents/{id}/edit - Review/edit system prompt, welcome/closing messages
- Adjust LLM model, temperature, max tokens
- Set voice ID (for voice interviews)
- Set turn-taking profile
- Add/reorder/edit question guides (with topics and probing depths)
- Save
4. Session Templates & Multi-Agent Orchestration
What Sessions Are
Session Templates are multi-agent interview blueprints that combine two or more agents in a specific configuration. They define:
- Which agents participate and in what roles
- How agents switch (orchestration mode)
- What topics each agent covers
- How interview time is allocated
Orchestration Modes
| Mode | Description |
|---|---|
tag_team | Agents dynamically switch based on topic relevance, sentiment, and coverage. The system scores all agents each turn and selects the best fit. |
sequential | Agents take turns in order — Agent 1 covers their questions, then Agent 2, etc. |
Agent Roles
Each agent in a session has a role:
| Role | Purpose |
|---|---|
lead | Primary interviewer. Starts the conversation. Gets majority of time. |
support | Specialist. Takes over when their expertise is needed. |
Session Template Fields
| Field | Purpose |
|---|---|
name | Template name (e.g., "DIAGNOSE: System Autopsy") |
description | What this session investigates |
orchestration_mode | tag_team or sequential |
language | Interview language |
target_duration_minutes | Expected duration (default: 30) |
interview_type | Classification tag |
Agent Assignment in Sessions
Each agent entry in a session has:
| Field | Purpose |
|---|---|
agent_id | Which agent |
role | lead or support |
order_index | Speaking order |
assigned_topics | Topics this agent covers (e.g., ["Architecture", "Infrastructure"]) |
time_allocation_pct | Percentage of interview time (e.g., 70%) |
Multi-Agent Handoff: How It Works
In tag_team mode, the system evaluates which agent should respond on every turn. The selection uses a 5-dimension weighted scoring algorithm:
| Dimension | Weight | What It Measures |
|---|---|---|
| Topic Match | 35% | Does this agent own the current topic? |
| Sentiment Fit | 25% | Is this agent's style appropriate for the participant's emotional state? |
| Continuation Bonus | 20% | Is the current agent still probing a topic? (Avoid disruptive handoffs mid-thread.) |
| Balance Penalty | 15% | Has this agent had too many turns relative to others? (Prevents monopolization.) |
| Question Readiness | 15% | Does this agent have uncovered topics in their domain? |
Scoring Details
Topic Match (0.0–1.0):
- Exact topic match → 1.0
- Generalist (no assigned topics) → 0.5
- No match → 0.0
Sentiment Fit (0.0–1.0):
- Negative participant sentiment + empathetic agent → up to 1.0
- Negative sentiment + analytical agent → as low as 0.1
- Neutral/positive → 0.5 (any style works)
Continuation Bonus (0.0–1.0):
- Same agent, still probing (follow-ups < probing_depth) → bonus (decays toward 0)
- Different agent or probing exhausted → 0.0
Balance Penalty (0.0 to -1.0):
- Agent has had fair share of turns → 0.0
- Agent dominating (e.g., 30%+ above fair share) → -0.5 to -1.0
Question Readiness (0.0–1.0):
- Agent has uncovered topics in their domain → proportional score
- All their topics asked → 0.0
Handoff Logging
Every agent switch is logged with full scores:
{ "from_agent": "uuid-of-previous-agent", "to_agent": "uuid-of-new-agent", "turn": 5, "scores": { "agent-1-uuid": {"total": 0.62, "topic_match": 1.0, "sentiment_fit": 0.5, "...": "..."}, "agent-2-uuid": {"total": 0.58, "topic_match": 0.0, "sentiment_fit": 0.8, "...": "..."} } }
This creates a complete audit trail of why each handoff decision was made.
Admin Workflow: Create Session
- Navigate to
/sessions/new - Enter session name, description, language, duration, interview type
- Select orchestration mode (tag_team or sequential)
- Add agents: select from available agents, assign roles, set assigned topics, allocate time %
- Save → session template is ready for launching interviews
5. Interview Lifecycle
Full Lifecycle
CREATE ──▶ INVITE ──▶ CONSENT ──▶ START ──▶ MESSAGES ──▶ COMPLETE
Step 1: Create Interview
From the session detail page (/sessions/{id}), click "+ Participant":
- Enter participant name, role, department (optional)
- Select modality: Text (chat) or Voice (real-time audio)
- Optionally override duration
- Click "Create Invitation"
The system creates an Interview record (status: created) and generates an invitation.
Step 2: Invite Participant
After creation, the modal shows:
- Copy Link: A shareable URL like
https://frontend.example.com/i/abc123token - Send Email: Enter email address → system sends branded HTML email via Azure Communication Services or SMTP
Invitation tokens expire after 30 days. Each access increments a counter for tracking.
Step 3: Participant Consent
When the participant opens the link:
- Token is validated (not expired, interview not completed)
- If name was provided during invitation, a personalized welcome is shown
- If no name, a name input field appears
- Consent text explains: modality (text/voice), expected duration, recording notice
- Participant clicks "Begin Interview"
Step 4: Start Interview
- Status changes to
in_progress started_attimestamp is set- Agent's welcome message is posted as the first message
- For voice: WebSocket connection established, welcome message synthesized via TTS
Step 5: Message Exchange
Each turn follows the 5-node pipeline (see Section 6):
- Participant sends message
- Pacing node determines interview phase (rapport/exploration/close)
- Route node selects best agent (in multi-agent sessions)
- Agent node generates response via Claude
- QA node validates quality and tracks coverage
- Response delivered to participant
The system tracks:
- Question coverage: which questions from the guide have been addressed
- Topic coverage: progress per dimension/topic
- Sentiment: per-message sentiment scoring
- Turn count: total exchanges
Step 6: Complete Interview
Interview completes when:
- The AI determines all required questions are covered and time is approaching, OR
- The participant has answered enough for meaningful analysis, OR
- The admin manually ends the interview
Status changes to completed, completed_at is set, duration_seconds calculated.
Three-Act Interview Structure
The interview follows a cinematic three-act structure driven by elapsed time and coverage:
| Act | Time Range | Focus |
|---|---|---|
| Rapport | 0–20% | Build trust. Focus on the person, not the topic. Light questions. |
| Exploration | 20–80% | Deep dive. Follow the participant's energy. Use the question guide as a map, not a script. |
| Close | 80–100% | Wrap up. Cover remaining required topics. Thank participant. |
Act transitions can be modified by:
- Negative sentiment early: Extends rapport phase to build trust
- High coverage early: Moves to close earlier
- Low coverage late: Stays in exploration past 80%
6. AI Decision-Making in Interviews
The 5-Node LangGraph Pipeline
Every participant message flows through a 5-node processing pipeline:
PARTICIPANT MESSAGE
│
▼
┌──────────────┐
│ 1. PACING │ Determines: rapport / exploration / close
│ NODE │ Inputs: elapsed time, coverage, sentiment
└──────┬───────┘
│
▼
┌──────────────┐
│ 2. ROUTE │ Selects: which agent responds
│ NODE │ Inputs: topic, sentiment, agent scores
└──────┬───────┘
│
▼
┌──────────────┐
│ 3. AGENT │ Generates: the actual response
│ NODE │ Inputs: system prompt, conversation history
└──────┬───────┘
│
▼
┌──────────────┐
│ 4. QA │ Validates: response quality
│ NODE │ Checks: anti-patterns, length, engagement
└──────┬───────┘
│
▼
┌──────────────┐
│ 5. OUTPUT │ Formats: API response
│ NODE │ Returns: message, coverage, metadata
└──────────────┘
Node 1: Pacing — Act Selection
What the AI decides: When to transition between rapport, exploration, and close.
The pacing node computes elapsed_ratio (0.0–1.0) based on (now - started_at) / target_duration_minutes, then applies rules:
elapsed < 0.20→ Rapportelapsed < 0.25 AND negative sentiment→ Extended Rapport (build trust first)0.20 ≤ elapsed < 0.80→ Exploration (unless coverage > 90% → early Close)elapsed ≥ 0.80→ Close (unless coverage < 40% → stay in Exploration)
Node 2: Route — Agent Selection
What the AI decides: Which agent should respond (in multi-agent sessions).
For single-agent interviews, this is a passthrough. For multi-agent sessions, the 5-dimension scoring algorithm described in Section 4 is applied. The agent with the highest weighted score is selected. If a different agent is selected, a handoff is logged.
Node 3: Agent — Response Generation
What the AI decides: What to say to the participant.
The agent node constructs a comprehensive system prompt (V2 Canonical) and calls Claude. The prompt includes:
7 Golden Rules
- This is a conversation: Acknowledge → Bridge → Advance. Reference what they said specifically.
- Be concise: 2–3 sentences max (never > 4). After a long answer: 1 sentence only. Questions ≤ 15 words.
- Be genuinely curious: React to surprises. Pull threads from passing remarks.
- Feel what they feel: Negative sentiment → acknowledge frustration first. Never respond to emotion with a factual question.
- Remember everything: Reference earlier conversation naturally. Spot contradictions.
- Use mirroring: Paraphrase core points. Max 1 mirror per 2–3 turns.
- Respect time and space: Long pauses → wait. Interesting tangent → follow it. Short answer → probe once, then move on.
Anti-Patterns ("What You Never Do")
The system prompt explicitly forbids:
- Reading questions verbatim from the guide
- Saying "That's a great point" or "Great question"
- Hinting the agent is artificial
- Lecturing or teaching
- Reusing the same probe
- Asking two questions per turn
- Using bullet points/lists while speaking
- Saying "Let's move on" or "Next topic"
- Front-loading with excessive context
- Ignoring emotional content for data
- Being sycophantic ("Wow!", "Brilliant!")
Response Format
The LLM returns structured JSON:
{ "agent_response": "The actual message to the participant", "sentiment": { "score": 0.3, "label": "positive" }, "interview_complete": false }
Node 4: QA — Quality Validation
What's checked (deterministic):
- Word count: > 30 words (text) or > 20 words (voice) → flagged as
too_long - Question count: > 1 question mark → flagged as
multiple_questions - Sentence count: > 6 (text) or > 4 (voice) → flagged as
too_many_sentences - Empty response: → flagged as
empty_response - Coverage tracking: Updates which topics have been covered vs remaining
AI vs Deterministic Decisions (Interviews)
| Decision | Made By | Evidence |
|---|---|---|
| Act transition (rapport/exploration/close) | Deterministic rules + AI override | Elapsed time, coverage %, sentiment score |
| Which agent responds | Weighted scoring algorithm | 5-dimension scores logged per handoff |
| Response content | AI (Claude) | System prompt + conversation history |
| When to ask follow-up vs advance | AI (Claude) | Question guide + probing depth config |
| Sentiment scoring | AI (Claude) | Per-message -1.0 to +1.0 score |
| Interview completion | AI (Claude) signals, deterministic check | Coverage + time + AI judgment |
| Word/sentence limit check | Deterministic | Character counting |
| Anti-pattern detection | Deterministic | Regex + counting |
| Coverage calculation | Deterministic | question_guide_id on messages |
| Turn counting | Deterministic | sequence_number |
7. Voice Pipeline
Architecture
Voice interviews use a real-time bidirectional pipeline:
Participant's Microphone
│
▼ (WebSocket: binary audio chunks)
┌──────────────┐
│ Deepgram │ STT: Audio → Text (streaming, ~500ms)
│ Nova-2 │ Interim + final transcripts
└──────┬───────┘
│
▼ (accumulated transcript)
┌──────────────┐
│ Turn-Taking │ 5-signal fusion → "participant done speaking?"
│ Engine │ Threshold-based decision
└──────┬───────┘
│ (when turn taken)
▼
┌──────────────┐
│ Claude LLM │ Same interview engine as text (~1500ms)
│ │ Returns structured JSON response
└──────┬───────┘
│
▼
┌──────────────┐
│ ElevenLabs │ TTS: Text → Audio (streaming MP3, ~1000ms)
│ │ Agent's voice
└──────┬───────┘
│
▼ (WebSocket: binary audio)
Participant's Speakers
Target latency: < 4 seconds round-trip (STT ~500ms + LLM ~1500ms + TTS ~1000ms + network ~500ms).
Turn-Taking Engine
The turn-taking engine decides when the participant has finished speaking. It fuses 5 signals to avoid premature interruption or excessive waiting:
Signal 1: VAD Silence Duration (35% weight)
Measures how long since the participant stopped speaking:
| Silence | Score | Interpretation |
|---|---|---|
| < 200ms | 0.0 | Breathing pause |
| 200–400ms | 0.1 | Normal inter-sentence |
| 400–700ms | 0.3 | Could be done |
| 700–1000ms | 0.5 | Likely done |
| 1000–1500ms | 0.7 | Probably done |
| 1500–2500ms | 0.85 | Almost certainly |
| > 2500ms | 0.95 | Certainly done |
Signal 2: Audio Energy Decay (15% weight)
Analyzes the trajectory of audio energy before silence:
| Pattern | Score | Meaning |
|---|---|---|
| Strong downward slope | 0.85 | Natural trailing off — yielding turn |
| Gentle decline | 0.65 | Winding down |
| Flat energy | 0.40 | Uncertain — mid-thought pause |
| Rising energy | 0.15 | Was building momentum — interrupted |
Signal 3: Transcript Completeness (25% weight)
Rule-based analysis of the transcript text:
- Sentence-ending punctuation (
.,!,?): +0.30 - Trailing incomplete markers ("and", "but", "because"): -0.25
- Short known-complete answers ("yes", "no", "I agree"): +0.30
- Listing patterns without closure: -0.15
Signal 4: Semantic Sufficiency (25% weight)
Uses Claude Haiku (fast, cheap) to assess if the response is complete:
- Gated: Only called when silence > 500ms AND word count ≥ 3
- Prompt: "Is this response to the question complete?"
- Returns confidence 0.0–1.0
Signal 5: Context Adjustment (threshold modifier)
Modifies the decision threshold based on question type:
- Closed question (yes/no type): -0.10 threshold (trigger faster)
- Open question ("tell me about..."): +0.12 threshold (wait longer)
- Long response (> 30 seconds): -0.05 (person winding down)
Patience Profiles
| Profile | Base Threshold | Max Silence | Min Silence | Use Case |
|---|---|---|---|---|
responsive | 0.48 | 2000ms | 200ms | Fast-paced interviews |
balanced | 0.55 | 3000ms | 300ms | Default — most interviews |
patient | 0.65 | 5000ms | 500ms | Reflective participants |
therapist | 0.75 | 8000ms | 700ms | Deep emotional topics |
Barge-In Handling
When the participant starts speaking while the agent is talking:
- Client detects sustained speech energy (> 300ms above threshold)
- Sends
{"type": "barge_in"}via WebSocket - Server cancels ongoing turn evaluation and TTS playback
- Resets energy buffer
- Ready for new speech input
A 1200ms grace period after TTS starts prevents echo from triggering false barge-in.
8. Mayring QCA Analysis
Overview
The analysis engine implements Mayring's Qualitative Content Analysis — a systematic method for deriving categories and findings from qualitative data. The engine runs as a background task with three sequential phases.
Pipeline
PENDING ──▶ PHASE 1: CODING ──▶ PHASE 2: CATEGORIZING ──▶ PHASE 3: FINDINGS ──▶ COMPLETED
Phase 1: Coding (Segmentation)
Goal: Break every participant message into discrete coding units.
For each participant message, Claude:
- Segments the message into coding units (each = one distinct idea, 1–3 sentences)
- Extracts the verbatim text with character positions (
start_char,end_char) - Paraphrases the extract (removes filler, restates core meaning)
- Generalizes the meaning (abstracts to configurable level: low/medium/high)
- Scores sentiment (-1.0 to +1.0)
Example:
| Field | Value |
|---|---|
| Original message | "Well, our deployment pipeline is really slow, like 45 minutes per build. And honestly nobody trusts the test suite anymore because it's been broken for months." |
| Coding Unit 1 | |
coded_text | "our deployment pipeline is really slow, like 45 minutes per build" |
paraphrase | "Deployment pipeline takes 45 minutes per build" |
generalization | "CI/CD pipeline performance bottleneck" |
sentiment_score | -0.4 |
| Coding Unit 2 | |
coded_text | "nobody trusts the test suite anymore because it's been broken for months" |
paraphrase | "Test suite is untrusted due to months of failures" |
generalization | "Test infrastructure credibility loss" |
sentiment_score | -0.6 |
Progress: Commits after each message, so the polling endpoint shows real-time progress: "Coding messages 5/21 — 32 coding units extracted"
Phase 2: Categorizing
Goal: Build a category system from the coding units.
Three modes are available:
Inductive Mode (bottom-up)
The AI discovers categories naturally from the data:
- First reduction: Groups similar generalizations, removes redundancy
- Second reduction: Builds umbrella categories from groups
- Assignment: Each coding unit assigned to exactly one category
- Target: 5–20 categories
For each category, the AI generates:
- Name: Descriptive category label
- Definition: What belongs in this category
- Coding rule: Include when... / Exclude when...
- Anchor example: The most prototypical quote
Deductive Mode (top-down)
Admin provides predefined categories. The AI assigns each coding unit to the best-fit category:
- Exact match → assigned to predefined category
- No fit → assigned to "Uncategorized"
Mixed Mode (hybrid)
Combines both approaches:
- Try to fit each coding unit to predefined categories
- Units that don't fit → group into emergent categories
- Small emergent groups (below
min_frequency) → merge into "Other" - Each category is marked
is_deductive: true/false
Phase 3: Findings (Synthesis)
Goal: Generate analytical findings from the category system.
The AI identifies patterns across categories and generates findings of five types:
| Finding Type | Icon | Description |
|---|---|---|
theme | ◆ | A recurring pattern across multiple participants |
contradiction | ⇄ | Conflicting positions from different participants |
consensus | ● | Strong broad agreement across participants |
outlier | ◎ | A unique or surprising finding from one participant |
recommendation | → | An actionable suggestion grounded in the data |
Evidence Strength Rules
| Strength | Criteria |
|---|---|
strong | ≥ 5 supporting coding units from ≥ 3 different interviews |
moderate | 3–4 units from ≥ 2 interviews |
weak | < 3 units |
Each finding includes:
supporting_category_ids: Which categories support it (UUID references)supporting_quote_ids: Key verbatim quotes (UUID references to coding units)participant_count: How many unique interviews contributed
Analysis Configuration
When creating an analysis run, the admin configures:
{ "name": "System Autopsy — Technical Assessment", "interview_ids": ["uuid1", "uuid2", "uuid3"], "config": { "mode": "inductive", "abstraction_level": "medium", "min_frequency": 2, "include_sentiment": true, "deductive_categories": [] } }
| Config Field | Purpose |
|---|---|
mode | inductive, deductive, or mixed |
abstraction_level | low (concrete), medium (balanced), high (abstract) |
min_frequency | Minimum coding units per category in mixed mode |
include_sentiment | Whether to compute sentiment scores |
deductive_categories | Predefined categories (for deductive/mixed mode) |
Requirement: Minimum 3 completed interviews to run an analysis.
AI vs Deterministic Decisions (Analysis)
| Decision | Made By | Evidence |
|---|---|---|
| Where to segment message into units | AI (Claude) | Character positions verified against source |
| Paraphrase content | AI (Claude) | Generated per coding unit |
| Generalization content | AI (Claude) | Abstraction level configurable |
| Sentiment score per unit | AI (Claude) | -1.0 to +1.0 |
| Category names and definitions | AI (Claude) | Generated from data (inductive) or matched (deductive) |
| Category assignment | AI (Claude indices) → deterministic UUID mapping | LLM returns index, system maps to UUID |
| Finding type classification | AI (Claude) | Theme/contradiction/consensus/outlier/recommendation |
| Evidence strength rating | AI (Claude) with rules | Must meet count/interview thresholds |
| Supporting quotes selection | AI (Claude indices) → deterministic UUID mapping | LLM returns index, system maps to UUID |
| Traceability verification | Deterministic | coded_text substring check against message |
| Category frequency count | Deterministic | Count of assigned coding units |
| Average sentiment per category | Deterministic | Arithmetic mean of member units |
| Status transitions | Deterministic | pending → coding → categorizing → completed |
| Progress reporting | Deterministic | Message count, unit count, category count |
JSON Parsing Robustness
Claude sometimes returns JSON followed by commentary text. The parser handles this:
- Try
json.loads()(clean JSON) - Strip markdown fences (```json ... ```)
- Use
json.JSONDecoder.raw_decode()to find first JSON object - Fall back: treat entire message as one coding unit
9. Quality & Monitoring
Anti-Pattern Detectors (10 total)
The quality engine runs 10 detectors on every agent response:
| # | Detector | What It Catches | Method |
|---|---|---|---|
| 1 | Leading Questions | "Don't you think...", "Isn't it obvious..." | Regex pattern matching |
| 2 | Multiple Questions | More than one question per turn | Count "?" characters |
| 3 | Excessive Mirroring | Paraphrasing too frequently | Track mirroring in last 4 responses |
| 4 | Topic Repetition | Returning to already-covered topic | Check current_topic in covered list |
| 5 | Ignoring Emotion | Factual response to emotional content | Detect absence of empathy patterns when sentiment is negative |
| 6 | Premature Topic Switch | Moving on before adequate probing | followup_count < probing_depth AND count = 0 |
| 7 | Too-Long Response | Agent talking too much | Word count > 30 (text) or > 20 (voice) |
| 8 | Jargon Overload | Excessive business buzzwords | Count jargon words ("synergistic", "leverage", etc.) |
| 9 | Missed Follow-up | Ignoring rich participant response | Participant gave 15+ emotional words but agent didn't follow up |
| 10 | Generic Acknowledgment | Bland non-specific response | Response is 1–8 words matching generic patterns |
IEI (Interview Engagement Index)
A composite 0.0–1.0 score computed from 6 weighted components:
| Component | Weight | Calculation |
|---|---|---|
| Coverage Score | 25% | coverage_pct / 100 |
| Engagement Depth | 20% | min(avg_participant_words / 50, 1.0) |
| Anti-Pattern Penalty | 20% | max(1.0 - anti_pattern_count / 10, 0.0) |
| Mirroring Bonus | 10% | 1.0 (sweet spot: 1–3 mirrors), 0.6 (4–5), 0.3 (0 or 6+) |
| Sentiment Trajectory | 15% | Improving sentiment over time → higher score |
| Turn Adequacy | 10% | 1.0 for 5–15 turns (ideal range), scaled outside |
Example: 8 turns, 60% coverage, avg 35 words, 2 anti-patterns, 2 mirrors, improving sentiment → IEI = 0.72
Live Session Monitor
Real-time monitoring endpoints for active interviews:
| Endpoint | Returns |
|---|---|
GET /monitor/{id}/state | Current act (rapport/exploration/close), elapsed ratio, turn count, sentiment, anti-pattern flags |
GET /monitor/{id}/checkpoints | Turn-by-turn history (act + sentiment per turn) |
GET /monitor/{id}/metrics | Per-agent performance: turn count, avg response words, topics covered |
GET /monitor/{id}/messages | Full conversation with agent attribution |
Engagement Scorecard
Aggregates all quality metrics into a single report per interview:
- IEI score (0.0–1.0)
- List of anti-pattern flags detected
- Engagement summary: turn count, coverage %, average participant words, mirroring count
- Sentiment trend: improving / declining / stable
10. Evidence Explorer & Dashboard
Analysis Dashboard
After an analysis completes, the dashboard (/analysis/{id}) shows four panels:
Panel 1: Category Overview
- Horizontal bar chart of all categories, sized by frequency (coding unit count)
- Color-coded by average sentiment (green = positive, red = negative, orange = neutral)
- Deductive categories marked with "D" badge
- Click any category → drill-down showing definition, anchor example, coding rule, and link to evidence
Panel 2: Findings
- Card list of all findings, grouped by type (theme, contradiction, consensus, outlier, recommendation)
- Each card shows: type icon, title, evidence strength badge
- Click to expand: full description, participant count, supporting quote count, link to evidence chain
Panel 3: Sentiment Overview
- Overall average sentiment score with visual gauge (-1.0 to +1.0)
- Sentiment by category: sorted list showing per-category averages
- Color-coded (positive = green, negative = red)
Panel 4: Traceability Health
- Health indicator: checkmark (100% traceable) or warning (gaps detected)
- Statistics: total findings, categories, coding units, interviews
- Evidence strength breakdown: bar chart showing strong / moderate / weak counts
- Link to full Evidence Explorer
Evidence Explorer
The Evidence Explorer (/analysis/{id}/evidence) provides the drill-down interface:
- Finding → Categories: Which categories support this finding
- Category → Coding Units: All verbatim extracts assigned to this category
- Coding Unit → Message: The original participant message with the extract highlighted
- Message → Interview: Which participant, when, in what context
Every level shows the full context needed to evaluate whether the finding is well-supported.
11. Deployment & CI/CD
Production Environment (Azure)
| Component | Resource |
|---|---|
| Backend | Azure Container App: iaas-backend (port 8000) |
| Frontend | Azure Container App: iaas-frontend (port 3000) |
| Database | Azure PostgreSQL Flexible Server: ironcode-flex-server |
| Container Registry | ironcode.azurecr.io |
| Region | North Europe |
CI/CD Pipeline
Triggered on every push to main:
Push to main
│
├──▶ Build backend Docker image → Push to ACR
├──▶ Build frontend Docker image → Push to ACR
│
▼
Deploy backend container
│ (auto-runs: alembic upgrade head)
▼
Seed demo data
│ (POST /admin/seed → truncate + re-seed)
▼
Deploy frontend container
│
▼
Smoke tests
│ (health checks on both services)
▼
Done
Database Migrations
Migrations run automatically when the backend container starts (start.sh):
#!/bin/bash alembic upgrade head # Apply pending migrations uvicorn app.main:app --host 0.0.0.0 --port 8000 --workers 2
Seed Data
The POST /api/v1/admin/seed endpoint (requires admin password):
- Truncates all tables (
CASCADE) - Runs the DIAGNOSE seed: 5 personas, 12 agents, 6 session templates, 18 interviews, 302 messages, 1 completed analysis
Appendix: API Response Format
All API responses follow a consistent envelope:
Success:
{ "data": { "...": "..." }, "meta": { "request_id": "uuid", "timestamp": "2026-03-03T10:00:00Z" } }
Error:
{ "error": { "code": "ERROR_CODE", "message": "Human-readable description", "details": { "...": "..." } }, "meta": { "request_id": "uuid", "timestamp": "2026-03-03T10:00:00Z" } }
Document generated March 2026. IaaS PoC v0.2.0 by IRONCODE.