IaaS Platform Guide

Intelligence-as-a-Service by IRONCODE

Version 0.2.0March 202611 Sections

1. Introduction & Architecture

Purpose

IaaS (Intelligence-as-a-Service) is a platform where AI agents conduct structured qualitative research interviews via chat and voice, then analyze the results using Mayring's Qualitative Content Analysis (QCA). The core differentiator: every finding traces back to its source — zero hallucinated insights, 100% traceability.

Hypotheses Under Test

ID	Hypothesis	Success Criteria
H1/H2	AI agent can conduct structured interviews via chat AND voice	≥ 80% question coverage
H3	Agents are configurable (prompt, questions, model, temperature) and produce measurably different behavior	Measurable difference across configs
H4/H5	Mayring QCA analysis with 100% traceability	Every finding → coding_unit → message → interview

Tech Stack

Layer	Technology
Backend	FastAPI (Python 3.12), async
Frontend	Next.js 14 + TypeScript
Database	PostgreSQL 16, Alembic migrations
LLM	Claude API (claude-sonnet-4-6 default, configurable per agent)
Speech-to-Text	Deepgram Nova-2 (streaming via WebSocket)
Text-to-Speech	ElevenLabs (eleven_turbo_v2_5, streaming)
Voice Transport	WebSocket (not WebRTC for PoC)
Deploy	Docker Compose (local), Azure Container Apps (production)

Architecture

Three containers compose the system:

┌──────────────────┐   ┌──────────────────┐   ┌──────────────────┐
│   iaas-frontend  │   │   iaas-backend   │   │     iaas-db      │
│   Next.js :3000  │──▶│   FastAPI :8000  │──▶│  PostgreSQL :5432│
└──────────────────┘   └──────────────────┘   └──────────────────┘
                              │
                    ┌─────────┼─────────┐
                    ▼         ▼         ▼
               Claude API  Deepgram  ElevenLabs
               (LLM)      (STT)     (TTS)

The Traceability Chain (Core Principle)

Every analytical finding links back through an unbroken evidence chain:

AnalysisFinding
  └─▶ Category (AI-grouped theme)
       └─▶ CodingUnit (verbatim extract + paraphrase + generalization)
            └─▶ Message (original participant response)
                 └─▶ Interview (session with participant)
                      └─▶ Agent (configured interviewer)

No finding exists without traceable evidence. No quote is fabricated.

Database Tables (10 total)

Table	Purpose
`personas`	Reusable personality profiles
`agents`	Executable AI interviewers with LLM config
`question_guides`	Ordered interview questions per agent
`session_templates`	Multi-agent session blueprints
`session_template_agents`	Agent–session mapping with roles
`interviews`	Individual interview instances
`messages`	All interview messages (participant + agent)
`voice_sessions`	Voice-specific session metadata
`analysis_runs`	QCA analysis instances with config
`coding_units`	Segmented message extracts
`categories`	Thematic groupings of coding units
`analysis_findings`	Synthesized research findings
`interview_invitations`	Shareable invitation tokens
`audit_log`	State-change audit trail

2. Persona Library

What Personas Are

Personas are reusable personality profiles that serve as templates for creating agents. Each persona encapsulates a consistent interviewer personality — name, description, system prompt, voice settings, language, and behavioral rules. Instead of configuring each agent from scratch, admins pick a persona and "instantiate" it into a ready-to-configure agent.

Persona Fields

Field	Type	Purpose
`key`	String (unique)	Slug identifier (e.g., `"system_autopsy"`)
`name`	String	Display name (e.g., "Dr. Vera Kessler")
`personality_description`	Text	Rich personality profile
`system_prompt_template`	Text	Base system prompt instructions
`language_default`	String	Default interview language (`"en"`, `"de"`)
`voice_description`	Text	Description of voice tone/style
`default_voice_id`	String	ElevenLabs voice ID
`best_for`	JSON array	Tags like `["architecture", "technical"]`
`engagement_rules`	JSON	Custom rules (max_follow_ups, probe_on_vague, etc.)

Admin Workflow: Persona → Agent

Browse Personas at /personas — grid view with personality previews, best_for tags, and language badges.
Click "Instantiate" — opens a modal asking for welcome message and optional interview type.
System creates an Agent pre-configured with the persona's personality, prompt, voice, and language.
Redirects to agent edit page (/agents/{id}/edit) where the admin fine-tunes questions, model parameters, etc.

DIAGNOSE Phase Personas

The platform ships with 5 DIAGNOSE dimension personas:

Persona	Key	Focus	Style
Dr. Vera Kessler	`system_autopsy`	Technical landscape, legacy systems, vendor lock-in	Methodical, precise, analytical
James Okafor	`capacity_audit`	Time allocation, automation potential, work-about-work	Data-driven, direct, efficient
Dr. Miriam Steinfeld	`organizational_autopsy`	Decision flows, org structure, culture patterns	Empathetic, insightful, structured
Raj Anand	`data_intelligence`	Data maturity, AI readiness, information architecture	Curious, systematic, technically deep
Ingrid Larsen	`process_zero_test`	Core processes, failure modes, handoff patterns	Pragmatic, thorough, structured

3. Agent Configuration

What Agents Are

Agents are executable interview entities — they are what actually conducts the conversation with participants. Each agent has independently configurable LLM parameters, interview behavior, voice settings, and a set of ordered questions.

Configurable Parameters

Parameter	Default	Purpose	Impact on Behavior
`model_name`	`claude-sonnet-4-6`	Which Claude model to use	Sonnet: balanced. Opus: deeper reasoning. Haiku: faster, cheaper.
`temperature`	0.7	LLM sampling temperature (0.0–1.0)	Lower = more predictable/focused. Higher = more creative/varied.
`max_tokens`	1024	Response length limit	Constrains how long the agent's responses can be
`system_prompt`	(from persona)	Core personality and instructions	Defines the agent's entire conversational identity
`welcome_message`	(required)	First message to participant	Sets the tone for the entire interview
`closing_message`	"Thank you..."	Final message	Wraps up the interview gracefully
`language`	`"en"`	Interview language	Affects prompt modules (German/English specific rules)
`voice_id`	(optional)	ElevenLabs voice ID	Which voice speaks in voice interviews
`turn_taking_profile`	`"balanced"`	Voice conversation patience	`responsive` (fast) → `balanced` → `patient` → `therapist` (very patient)
`interview_type`	(optional)	Classification tag	E.g., "architecture", "culture", "technical"
`dimension_config`	(optional)	Custom tracking dimensions	Enables visual dimension tracking in the UI

How Agent Settings Affect Behavior (H3)

The same interview questions asked by two differently configured agents produce measurably different results:

Temperature 0.3 agent: Sticks closely to question guide, minimal creative probing, consistent responses across interviews.
Temperature 0.9 agent: More varied follow-ups, creative analogies, sometimes surprising questions.
Sonnet model: Balanced depth and speed, good for most interviews.
Opus model: Deeper reasoning, better at detecting contradictions and nuance, slower.
"therapist" turn-taking: Waits up to 8 seconds before speaking in voice, allows long pauses for reflection.
"responsive" turn-taking: Takes over after 2 seconds of silence, keeps pace high.

Question Guides

Each agent has an ordered list of question guides — the interview curriculum.

Field	Purpose
`question_text`	The question to ask
`order_index`	Position in sequence (0, 1, 2, ...)
`probing_depth`	How many follow-up rounds (0–5). Default: 2
`is_required`	Must be asked before interview can complete
`topic_label`	Dimension/topic name for tracking (e.g., "Architecture")

Probing Depth controls how the interview engine follows up:

Depth	Behavior
0	No follow-ups. Move to next question after any response.
1	One round of clarification.
2	Two rounds of probing (default). Good for most questions.
3–5	Deep exploration. Agent probes extensively before advancing.

Topic Labels connect questions to the agent's dimension_config for visual tracking. When a question has topic_label: "Architecture", the UI shows progress on the "Architecture" dimension as that topic is covered.

Admin Workflow: Configure Agent

Navigate to /agents/{id}/edit
Review/edit system prompt, welcome/closing messages
Adjust LLM model, temperature, max tokens
Set voice ID (for voice interviews)
Set turn-taking profile
Add/reorder/edit question guides (with topics and probing depths)
Save

4. Session Templates & Multi-Agent Orchestration

What Sessions Are

Session Templates are multi-agent interview blueprints that combine two or more agents in a specific configuration. They define:

Which agents participate and in what roles
How agents switch (orchestration mode)
What topics each agent covers
How interview time is allocated

Orchestration Modes

Mode	Description
`tag_team`	Agents dynamically switch based on topic relevance, sentiment, and coverage. The system scores all agents each turn and selects the best fit.
`sequential`	Agents take turns in order — Agent 1 covers their questions, then Agent 2, etc.

Agent Roles

Each agent in a session has a role:

Role	Purpose
`lead`	Primary interviewer. Starts the conversation. Gets majority of time.
`support`	Specialist. Takes over when their expertise is needed.

Session Template Fields

Field	Purpose
`name`	Template name (e.g., "DIAGNOSE: System Autopsy")
`description`	What this session investigates
`orchestration_mode`	`tag_team` or `sequential`
`language`	Interview language
`target_duration_minutes`	Expected duration (default: 30)
`interview_type`	Classification tag

Agent Assignment in Sessions

Each agent entry in a session has:

Field	Purpose
`agent_id`	Which agent
`role`	`lead` or `support`
`order_index`	Speaking order
`assigned_topics`	Topics this agent covers (e.g., `["Architecture", "Infrastructure"]`)
`time_allocation_pct`	Percentage of interview time (e.g., 70%)

Multi-Agent Handoff: How It Works

In tag_team mode, the system evaluates which agent should respond on every turn. The selection uses a 5-dimension weighted scoring algorithm:

Dimension	Weight	What It Measures
Topic Match	35%	Does this agent own the current topic?
Sentiment Fit	25%	Is this agent's style appropriate for the participant's emotional state?
Continuation Bonus	20%	Is the current agent still probing a topic? (Avoid disruptive handoffs mid-thread.)
Balance Penalty	15%	Has this agent had too many turns relative to others? (Prevents monopolization.)
Question Readiness	15%	Does this agent have uncovered topics in their domain?

Scoring Details

Topic Match (0.0–1.0):

Exact topic match → 1.0
Generalist (no assigned topics) → 0.5
No match → 0.0

Sentiment Fit (0.0–1.0):

Negative participant sentiment + empathetic agent → up to 1.0
Negative sentiment + analytical agent → as low as 0.1
Neutral/positive → 0.5 (any style works)

Continuation Bonus (0.0–1.0):

Same agent, still probing (follow-ups < probing_depth) → bonus (decays toward 0)
Different agent or probing exhausted → 0.0

Balance Penalty (0.0 to -1.0):

Agent has had fair share of turns → 0.0
Agent dominating (e.g., 30%+ above fair share) → -0.5 to -1.0

Question Readiness (0.0–1.0):

Agent has uncovered topics in their domain → proportional score
All their topics asked → 0.0

Handoff Logging

Every agent switch is logged with full scores:

{
  "from_agent": "uuid-of-previous-agent",
  "to_agent": "uuid-of-new-agent",
  "turn": 5,
  "scores": {
    "agent-1-uuid": {"total": 0.62, "topic_match": 1.0, "sentiment_fit": 0.5, "...": "..."},
    "agent-2-uuid": {"total": 0.58, "topic_match": 0.0, "sentiment_fit": 0.8, "...": "..."}
  }
}

This creates a complete audit trail of why each handoff decision was made.

Admin Workflow: Create Session

Navigate to /sessions/new
Enter session name, description, language, duration, interview type
Select orchestration mode (tag_team or sequential)
Add agents: select from available agents, assign roles, set assigned topics, allocate time %
Save → session template is ready for launching interviews

5. Interview Lifecycle

Full Lifecycle

CREATE ──▶ INVITE ──▶ CONSENT ──▶ START ──▶ MESSAGES ──▶ COMPLETE

Step 1: Create Interview

From the session detail page (/sessions/{id}), click "+ Participant":

Enter participant name, role, department (optional)
Select modality: Text (chat) or Voice (real-time audio)
Optionally override duration
Click "Create Invitation"

The system creates an Interview record (status: created) and generates an invitation.

Step 2: Invite Participant

After creation, the modal shows:

Copy Link: A shareable URL like https://frontend.example.com/i/abc123token
Send Email: Enter email address → system sends branded HTML email via Azure Communication Services or SMTP

Invitation tokens expire after 30 days. Each access increments a counter for tracking.

When the participant opens the link:

Token is validated (not expired, interview not completed)
If name was provided during invitation, a personalized welcome is shown
If no name, a name input field appears
Consent text explains: modality (text/voice), expected duration, recording notice
Participant clicks "Begin Interview"

Step 4: Start Interview

Status changes to in_progress
started_at timestamp is set
Agent's welcome message is posted as the first message
For voice: WebSocket connection established, welcome message synthesized via TTS

Step 5: Message Exchange

Each turn follows the 5-node pipeline (see Section 6):

Participant sends message
Pacing node determines interview phase (rapport/exploration/close)
Route node selects best agent (in multi-agent sessions)
Agent node generates response via Claude
QA node validates quality and tracks coverage
Response delivered to participant

The system tracks:

Question coverage: which questions from the guide have been addressed
Topic coverage: progress per dimension/topic
Sentiment: per-message sentiment scoring
Turn count: total exchanges

Step 6: Complete Interview

Interview completes when:

The AI determines all required questions are covered and time is approaching, OR
The participant has answered enough for meaningful analysis, OR
The admin manually ends the interview

Status changes to completed, completed_at is set, duration_seconds calculated.

Three-Act Interview Structure

The interview follows a cinematic three-act structure driven by elapsed time and coverage:

Act	Time Range	Focus
Rapport	0–20%	Build trust. Focus on the person, not the topic. Light questions.
Exploration	20–80%	Deep dive. Follow the participant's energy. Use the question guide as a map, not a script.
Close	80–100%	Wrap up. Cover remaining required topics. Thank participant.

Act transitions can be modified by:

Negative sentiment early: Extends rapport phase to build trust
High coverage early: Moves to close earlier
Low coverage late: Stays in exploration past 80%

6. AI Decision-Making in Interviews

The 5-Node LangGraph Pipeline

Every participant message flows through a 5-node processing pipeline:

PARTICIPANT MESSAGE
       │
       ▼
┌──────────────┐
│  1. PACING   │  Determines: rapport / exploration / close
│    NODE      │  Inputs: elapsed time, coverage, sentiment
└──────┬───────┘
       │
       ▼
┌──────────────┐
│  2. ROUTE    │  Selects: which agent responds
│    NODE      │  Inputs: topic, sentiment, agent scores
└──────┬───────┘
       │
       ▼
┌──────────────┐
│  3. AGENT    │  Generates: the actual response
│    NODE      │  Inputs: system prompt, conversation history
└──────┬───────┘
       │
       ▼
┌──────────────┐
│  4. QA       │  Validates: response quality
│    NODE      │  Checks: anti-patterns, length, engagement
└──────┬───────┘
       │
       ▼
┌──────────────┐
│  5. OUTPUT   │  Formats: API response
│    NODE      │  Returns: message, coverage, metadata
└──────────────┘

Node 1: Pacing — Act Selection

What the AI decides: When to transition between rapport, exploration, and close.

The pacing node computes elapsed_ratio (0.0–1.0) based on (now - started_at) / target_duration_minutes, then applies rules:

elapsed < 0.20 → Rapport
elapsed < 0.25 AND negative sentiment → Extended Rapport (build trust first)
0.20 ≤ elapsed < 0.80 → Exploration (unless coverage > 90% → early Close)
elapsed ≥ 0.80 → Close (unless coverage < 40% → stay in Exploration)

Node 2: Route — Agent Selection

What the AI decides: Which agent should respond (in multi-agent sessions).

For single-agent interviews, this is a passthrough. For multi-agent sessions, the 5-dimension scoring algorithm described in Section 4 is applied. The agent with the highest weighted score is selected. If a different agent is selected, a handoff is logged.

Node 3: Agent — Response Generation

What the AI decides: What to say to the participant.

The agent node constructs a comprehensive system prompt (V2 Canonical) and calls Claude. The prompt includes:

7 Golden Rules

This is a conversation: Acknowledge → Bridge → Advance. Reference what they said specifically.
Be concise: 2–3 sentences max (never > 4). After a long answer: 1 sentence only. Questions ≤ 15 words.
Be genuinely curious: React to surprises. Pull threads from passing remarks.
Feel what they feel: Negative sentiment → acknowledge frustration first. Never respond to emotion with a factual question.
Remember everything: Reference earlier conversation naturally. Spot contradictions.
Use mirroring: Paraphrase core points. Max 1 mirror per 2–3 turns.
Respect time and space: Long pauses → wait. Interesting tangent → follow it. Short answer → probe once, then move on.

Anti-Patterns ("What You Never Do")

The system prompt explicitly forbids:

Reading questions verbatim from the guide
Saying "That's a great point" or "Great question"
Hinting the agent is artificial
Lecturing or teaching
Reusing the same probe
Asking two questions per turn
Using bullet points/lists while speaking
Saying "Let's move on" or "Next topic"
Front-loading with excessive context
Ignoring emotional content for data
Being sycophantic ("Wow!", "Brilliant!")

Response Format

The LLM returns structured JSON:

{
  "agent_response": "The actual message to the participant",
  "sentiment": {
    "score": 0.3,
    "label": "positive"
  },
  "interview_complete": false
}

Node 4: QA — Quality Validation

What's checked (deterministic):

Word count: > 30 words (text) or > 20 words (voice) → flagged as too_long
Question count: > 1 question mark → flagged as multiple_questions
Sentence count: > 6 (text) or > 4 (voice) → flagged as too_many_sentences
Empty response: → flagged as empty_response
Coverage tracking: Updates which topics have been covered vs remaining

AI vs Deterministic Decisions (Interviews)

Decision	Made By	Evidence
Act transition (rapport/exploration/close)	Deterministic rules + AI override	Elapsed time, coverage %, sentiment score
Which agent responds	Weighted scoring algorithm	5-dimension scores logged per handoff
Response content	AI (Claude)	System prompt + conversation history
When to ask follow-up vs advance	AI (Claude)	Question guide + probing depth config
Sentiment scoring	AI (Claude)	Per-message -1.0 to +1.0 score
Interview completion	AI (Claude) signals, deterministic check	Coverage + time + AI judgment
Word/sentence limit check	Deterministic	Character counting
Anti-pattern detection	Deterministic	Regex + counting
Coverage calculation	Deterministic	question_guide_id on messages
Turn counting	Deterministic	sequence_number

7. Voice Pipeline

Architecture

Voice interviews use a real-time bidirectional pipeline:

Participant's Microphone
       │
       ▼ (WebSocket: binary audio chunks)
┌──────────────┐
│  Deepgram    │  STT: Audio → Text (streaming, ~500ms)
│  Nova-2      │  Interim + final transcripts
└──────┬───────┘
       │
       ▼ (accumulated transcript)
┌──────────────┐
│  Turn-Taking │  5-signal fusion → "participant done speaking?"
│  Engine      │  Threshold-based decision
└──────┬───────┘
       │ (when turn taken)
       ▼
┌──────────────┐
│  Claude LLM  │  Same interview engine as text (~1500ms)
│              │  Returns structured JSON response
└──────┬───────┘
       │
       ▼
┌──────────────┐
│  ElevenLabs  │  TTS: Text → Audio (streaming MP3, ~1000ms)
│              │  Agent's voice
└──────┬───────┘
       │
       ▼ (WebSocket: binary audio)
Participant's Speakers

Target latency: < 4 seconds round-trip (STT ~500ms + LLM ~1500ms + TTS ~1000ms + network ~500ms).

Turn-Taking Engine

The turn-taking engine decides when the participant has finished speaking. It fuses 5 signals to avoid premature interruption or excessive waiting:

Signal 1: VAD Silence Duration (35% weight)

Measures how long since the participant stopped speaking:

Silence	Score	Interpretation
< 200ms	0.0	Breathing pause
200–400ms	0.1	Normal inter-sentence
400–700ms	0.3	Could be done
700–1000ms	0.5	Likely done
1000–1500ms	0.7	Probably done
1500–2500ms	0.85	Almost certainly
> 2500ms	0.95	Certainly done

Signal 2: Audio Energy Decay (15% weight)

Analyzes the trajectory of audio energy before silence:

Pattern	Score	Meaning
Strong downward slope	0.85	Natural trailing off — yielding turn
Gentle decline	0.65	Winding down
Flat energy	0.40	Uncertain — mid-thought pause
Rising energy	0.15	Was building momentum — interrupted

Signal 3: Transcript Completeness (25% weight)

Rule-based analysis of the transcript text:

Sentence-ending punctuation (., !, ?): +0.30
Trailing incomplete markers ("and", "but", "because"): -0.25
Short known-complete answers ("yes", "no", "I agree"): +0.30
Listing patterns without closure: -0.15

Signal 4: Semantic Sufficiency (25% weight)

Uses Claude Haiku (fast, cheap) to assess if the response is complete:

Gated: Only called when silence > 500ms AND word count ≥ 3
Prompt: "Is this response to the question complete?"
Returns confidence 0.0–1.0

Signal 5: Context Adjustment (threshold modifier)

Modifies the decision threshold based on question type:

Closed question (yes/no type): -0.10 threshold (trigger faster)
Open question ("tell me about..."): +0.12 threshold (wait longer)
Long response (> 30 seconds): -0.05 (person winding down)

Patience Profiles

Profile	Base Threshold	Max Silence	Min Silence	Use Case
`responsive`	0.48	2000ms	200ms	Fast-paced interviews
`balanced`	0.55	3000ms	300ms	Default — most interviews
`patient`	0.65	5000ms	500ms	Reflective participants
`therapist`	0.75	8000ms	700ms	Deep emotional topics

Barge-In Handling

When the participant starts speaking while the agent is talking:

Client detects sustained speech energy (> 300ms above threshold)
Sends {"type": "barge_in"} via WebSocket
Server cancels ongoing turn evaluation and TTS playback
Resets energy buffer
Ready for new speech input

A 1200ms grace period after TTS starts prevents echo from triggering false barge-in.

8. Mayring QCA Analysis

Overview

The analysis engine implements Mayring's Qualitative Content Analysis — a systematic method for deriving categories and findings from qualitative data. The engine runs as a background task with three sequential phases.

Pipeline

PENDING ──▶ PHASE 1: CODING ──▶ PHASE 2: CATEGORIZING ──▶ PHASE 3: FINDINGS ──▶ COMPLETED

Phase 1: Coding (Segmentation)

Goal: Break every participant message into discrete coding units.

For each participant message, Claude:

Segments the message into coding units (each = one distinct idea, 1–3 sentences)
Extracts the verbatim text with character positions (start_char, end_char)
Paraphrases the extract (removes filler, restates core meaning)
Generalizes the meaning (abstracts to configurable level: low/medium/high)
Scores sentiment (-1.0 to +1.0)

Example:

Field	Value
Original message	"Well, our deployment pipeline is really slow, like 45 minutes per build. And honestly nobody trusts the test suite anymore because it's been broken for months."
Coding Unit 1
`coded_text`	"our deployment pipeline is really slow, like 45 minutes per build"
`paraphrase`	"Deployment pipeline takes 45 minutes per build"
`generalization`	"CI/CD pipeline performance bottleneck"
`sentiment_score`	-0.4
Coding Unit 2
`coded_text`	"nobody trusts the test suite anymore because it's been broken for months"
`paraphrase`	"Test suite is untrusted due to months of failures"
`generalization`	"Test infrastructure credibility loss"
`sentiment_score`	-0.6

Progress: Commits after each message, so the polling endpoint shows real-time progress: "Coding messages 5/21 — 32 coding units extracted"

Phase 2: Categorizing

Goal: Build a category system from the coding units.

Three modes are available:

Inductive Mode (bottom-up)

The AI discovers categories naturally from the data:

First reduction: Groups similar generalizations, removes redundancy
Second reduction: Builds umbrella categories from groups
Assignment: Each coding unit assigned to exactly one category
Target: 5–20 categories

For each category, the AI generates:

Name: Descriptive category label
Definition: What belongs in this category
Coding rule: Include when... / Exclude when...
Anchor example: The most prototypical quote

Deductive Mode (top-down)

Admin provides predefined categories. The AI assigns each coding unit to the best-fit category:

Exact match → assigned to predefined category
No fit → assigned to "Uncategorized"

Mixed Mode (hybrid)

Combines both approaches:

Try to fit each coding unit to predefined categories
Units that don't fit → group into emergent categories
Small emergent groups (below min_frequency) → merge into "Other"
Each category is marked is_deductive: true/false

Phase 3: Findings (Synthesis)

Goal: Generate analytical findings from the category system.

The AI identifies patterns across categories and generates findings of five types:

Finding Type	Icon	Description
`theme`	◆	A recurring pattern across multiple participants
`contradiction`	⇄	Conflicting positions from different participants
`consensus`	●	Strong broad agreement across participants
`outlier`	◎	A unique or surprising finding from one participant
`recommendation`	→	An actionable suggestion grounded in the data

Evidence Strength Rules

Strength	Criteria
`strong`	≥ 5 supporting coding units from ≥ 3 different interviews
`moderate`	3–4 units from ≥ 2 interviews
`weak`	< 3 units

Each finding includes:

supporting_category_ids: Which categories support it (UUID references)
supporting_quote_ids: Key verbatim quotes (UUID references to coding units)
participant_count: How many unique interviews contributed

Analysis Configuration

When creating an analysis run, the admin configures:

{
  "name": "System Autopsy — Technical Assessment",
  "interview_ids": ["uuid1", "uuid2", "uuid3"],
  "config": {
    "mode": "inductive",
    "abstraction_level": "medium",
    "min_frequency": 2,
    "include_sentiment": true,
    "deductive_categories": []
  }
}

Config Field	Purpose
`mode`	`inductive`, `deductive`, or `mixed`
`abstraction_level`	`low` (concrete), `medium` (balanced), `high` (abstract)
`min_frequency`	Minimum coding units per category in mixed mode
`include_sentiment`	Whether to compute sentiment scores
`deductive_categories`	Predefined categories (for deductive/mixed mode)

Requirement: Minimum 3 completed interviews to run an analysis.

AI vs Deterministic Decisions (Analysis)

Decision	Made By	Evidence
Where to segment message into units	AI (Claude)	Character positions verified against source
Paraphrase content	AI (Claude)	Generated per coding unit
Generalization content	AI (Claude)	Abstraction level configurable
Sentiment score per unit	AI (Claude)	-1.0 to +1.0
Category names and definitions	AI (Claude)	Generated from data (inductive) or matched (deductive)
Category assignment	AI (Claude indices) → deterministic UUID mapping	LLM returns index, system maps to UUID
Finding type classification	AI (Claude)	Theme/contradiction/consensus/outlier/recommendation
Evidence strength rating	AI (Claude) with rules	Must meet count/interview thresholds
Supporting quotes selection	AI (Claude indices) → deterministic UUID mapping	LLM returns index, system maps to UUID
Traceability verification	Deterministic	coded_text substring check against message
Category frequency count	Deterministic	Count of assigned coding units
Average sentiment per category	Deterministic	Arithmetic mean of member units
Status transitions	Deterministic	pending → coding → categorizing → completed
Progress reporting	Deterministic	Message count, unit count, category count

JSON Parsing Robustness

Claude sometimes returns JSON followed by commentary text. The parser handles this:

Try json.loads() (clean JSON)
Strip markdown fences (```json ... ```)
Use json.JSONDecoder.raw_decode() to find first JSON object
Fall back: treat entire message as one coding unit

9. Quality & Monitoring

Anti-Pattern Detectors (10 total)

The quality engine runs 10 detectors on every agent response:

#	Detector	What It Catches	Method
1	Leading Questions	"Don't you think...", "Isn't it obvious..."	Regex pattern matching
2	Multiple Questions	More than one question per turn	Count "?" characters
3	Excessive Mirroring	Paraphrasing too frequently	Track mirroring in last 4 responses
4	Topic Repetition	Returning to already-covered topic	Check current_topic in covered list
5	Ignoring Emotion	Factual response to emotional content	Detect absence of empathy patterns when sentiment is negative
6	Premature Topic Switch	Moving on before adequate probing	followup_count < probing_depth AND count = 0
7	Too-Long Response	Agent talking too much	Word count > 30 (text) or > 20 (voice)
8	Jargon Overload	Excessive business buzzwords	Count jargon words ("synergistic", "leverage", etc.)
9	Missed Follow-up	Ignoring rich participant response	Participant gave 15+ emotional words but agent didn't follow up
10	Generic Acknowledgment	Bland non-specific response	Response is 1–8 words matching generic patterns

IEI (Interview Engagement Index)

A composite 0.0–1.0 score computed from 6 weighted components:

Component	Weight	Calculation
Coverage Score	25%	`coverage_pct / 100`
Engagement Depth	20%	`min(avg_participant_words / 50, 1.0)`
Anti-Pattern Penalty	20%	`max(1.0 - anti_pattern_count / 10, 0.0)`
Mirroring Bonus	10%	1.0 (sweet spot: 1–3 mirrors), 0.6 (4–5), 0.3 (0 or 6+)
Sentiment Trajectory	15%	Improving sentiment over time → higher score
Turn Adequacy	10%	1.0 for 5–15 turns (ideal range), scaled outside

Example: 8 turns, 60% coverage, avg 35 words, 2 anti-patterns, 2 mirrors, improving sentiment → IEI = 0.72

Live Session Monitor

Real-time monitoring endpoints for active interviews:

Endpoint	Returns
`GET /monitor/{id}/state`	Current act (rapport/exploration/close), elapsed ratio, turn count, sentiment, anti-pattern flags
`GET /monitor/{id}/checkpoints`	Turn-by-turn history (act + sentiment per turn)
`GET /monitor/{id}/metrics`	Per-agent performance: turn count, avg response words, topics covered
`GET /monitor/{id}/messages`	Full conversation with agent attribution

Engagement Scorecard

Aggregates all quality metrics into a single report per interview:

IEI score (0.0–1.0)
List of anti-pattern flags detected
Engagement summary: turn count, coverage %, average participant words, mirroring count
Sentiment trend: improving / declining / stable

10. Evidence Explorer & Dashboard

Analysis Dashboard

After an analysis completes, the dashboard (/analysis/{id}) shows four panels:

Panel 1: Category Overview

Horizontal bar chart of all categories, sized by frequency (coding unit count)
Color-coded by average sentiment (green = positive, red = negative, orange = neutral)
Deductive categories marked with "D" badge
Click any category → drill-down showing definition, anchor example, coding rule, and link to evidence

Panel 2: Findings

Card list of all findings, grouped by type (theme, contradiction, consensus, outlier, recommendation)
Each card shows: type icon, title, evidence strength badge
Click to expand: full description, participant count, supporting quote count, link to evidence chain

Panel 3: Sentiment Overview

Overall average sentiment score with visual gauge (-1.0 to +1.0)
Sentiment by category: sorted list showing per-category averages
Color-coded (positive = green, negative = red)

Panel 4: Traceability Health

Health indicator: checkmark (100% traceable) or warning (gaps detected)
Statistics: total findings, categories, coding units, interviews
Evidence strength breakdown: bar chart showing strong / moderate / weak counts
Link to full Evidence Explorer

Evidence Explorer

The Evidence Explorer (/analysis/{id}/evidence) provides the drill-down interface:

Finding → Categories: Which categories support this finding
Category → Coding Units: All verbatim extracts assigned to this category
Coding Unit → Message: The original participant message with the extract highlighted
Message → Interview: Which participant, when, in what context

Every level shows the full context needed to evaluate whether the finding is well-supported.

11. Deployment & CI/CD

Production Environment (Azure)

Component	Resource
Backend	Azure Container App: `iaas-backend` (port 8000)
Frontend	Azure Container App: `iaas-frontend` (port 3000)
Database	Azure PostgreSQL Flexible Server: `ironcode-flex-server`
Container Registry	`ironcode.azurecr.io`
Region	North Europe

CI/CD Pipeline

Triggered on every push to main:

Push to main
    │
    ├──▶ Build backend Docker image → Push to ACR
    ├──▶ Build frontend Docker image → Push to ACR
    │
    ▼
Deploy backend container
    │ (auto-runs: alembic upgrade head)
    ▼
Seed demo data
    │ (POST /admin/seed → truncate + re-seed)
    ▼
Deploy frontend container
    │
    ▼
Smoke tests
    │ (health checks on both services)
    ▼
Done

Database Migrations

Migrations run automatically when the backend container starts (start.sh):

#!/bin/bash
alembic upgrade head    # Apply pending migrations
uvicorn app.main:app --host 0.0.0.0 --port 8000 --workers 2

Seed Data

The POST /api/v1/admin/seed endpoint (requires admin password):

Truncates all tables (CASCADE)
Runs the DIAGNOSE seed: 5 personas, 12 agents, 6 session templates, 18 interviews, 302 messages, 1 completed analysis

Appendix: API Response Format

All API responses follow a consistent envelope:

Success:

{
  "data": { "...": "..." },
  "meta": {
    "request_id": "uuid",
    "timestamp": "2026-03-03T10:00:00Z"
  }
}

Error:

{
  "error": {
    "code": "ERROR_CODE",
    "message": "Human-readable description",
    "details": { "...": "..." }
  },
  "meta": {
    "request_id": "uuid",
    "timestamp": "2026-03-03T10:00:00Z"
  }
}

Document generated March 2026. IaaS PoC v0.2.0 by IRONCODE.

1. Introduction & Architecture

Purpose

Hypotheses Under Test

Tech Stack

Architecture

The Traceability Chain (Core Principle)

Database Tables (10 total)

2. Persona Library

What Personas Are

Persona Fields

Admin Workflow: Persona → Agent

DIAGNOSE Phase Personas

3. Agent Configuration

What Agents Are

Configurable Parameters

How Agent Settings Affect Behavior (H3)

Question Guides

Admin Workflow: Configure Agent

4. Session Templates & Multi-Agent Orchestration

What Sessions Are

Orchestration Modes

Agent Roles

Session Template Fields

Agent Assignment in Sessions

Multi-Agent Handoff: How It Works

Scoring Details

Handoff Logging

Admin Workflow: Create Session

5. Interview Lifecycle

Full Lifecycle

Step 1: Create Interview

Step 2: Invite Participant

Step 3: Participant Consent

Step 4: Start Interview

Step 5: Message Exchange

Step 6: Complete Interview

Three-Act Interview Structure

6. AI Decision-Making in Interviews

The 5-Node LangGraph Pipeline

Node 1: Pacing — Act Selection

Node 2: Route — Agent Selection

Node 3: Agent — Response Generation

7 Golden Rules

Anti-Patterns ("What You Never Do")

Response Format

Node 4: QA — Quality Validation

AI vs Deterministic Decisions (Interviews)

7. Voice Pipeline

Architecture

Turn-Taking Engine

Signal 1: VAD Silence Duration (35% weight)

Signal 2: Audio Energy Decay (15% weight)

Signal 3: Transcript Completeness (25% weight)

Signal 4: Semantic Sufficiency (25% weight)

Signal 5: Context Adjustment (threshold modifier)

Patience Profiles

Barge-In Handling

8. Mayring QCA Analysis

Overview

Pipeline

Phase 1: Coding (Segmentation)

Phase 2: Categorizing

Inductive Mode (bottom-up)

Deductive Mode (top-down)

Mixed Mode (hybrid)

Phase 3: Findings (Synthesis)

Evidence Strength Rules

Analysis Configuration

AI vs Deterministic Decisions (Analysis)

JSON Parsing Robustness

9. Quality & Monitoring

Anti-Pattern Detectors (10 total)

IEI (Interview Engagement Index)

Live Session Monitor

Engagement Scorecard

10. Evidence Explorer & Dashboard

Analysis Dashboard

Panel 1: Category Overview

Panel 2: Findings

Panel 3: Sentiment Overview