IaaS Platform Guide

Intelligence-as-a-Service by IRONCODE

Version 0.2.0March 202611 Sections

1. Introduction & Architecture

Purpose

IaaS (Intelligence-as-a-Service) is a platform where AI agents conduct structured qualitative research interviews via chat and voice, then analyze the results using Mayring's Qualitative Content Analysis (QCA). The core differentiator: every finding traces back to its source — zero hallucinated insights, 100% traceability.

Hypotheses Under Test

IDHypothesisSuccess Criteria
H1/H2AI agent can conduct structured interviews via chat AND voice≥ 80% question coverage
H3Agents are configurable (prompt, questions, model, temperature) and produce measurably different behaviorMeasurable difference across configs
H4/H5Mayring QCA analysis with 100% traceabilityEvery finding → coding_unit → message → interview

Tech Stack

LayerTechnology
BackendFastAPI (Python 3.12), async
FrontendNext.js 14 + TypeScript
DatabasePostgreSQL 16, Alembic migrations
LLMClaude API (claude-sonnet-4-6 default, configurable per agent)
Speech-to-TextDeepgram Nova-2 (streaming via WebSocket)
Text-to-SpeechElevenLabs (eleven_turbo_v2_5, streaming)
Voice TransportWebSocket (not WebRTC for PoC)
DeployDocker Compose (local), Azure Container Apps (production)

Architecture

Three containers compose the system:

┌──────────────────┐   ┌──────────────────┐   ┌──────────────────┐
│   iaas-frontend  │   │   iaas-backend   │   │     iaas-db      │
│   Next.js :3000  │──▶│   FastAPI :8000  │──▶│  PostgreSQL :5432│
└──────────────────┘   └──────────────────┘   └──────────────────┘
                              │
                    ┌─────────┼─────────┐
                    ▼         ▼         ▼
               Claude API  Deepgram  ElevenLabs
               (LLM)      (STT)     (TTS)

The Traceability Chain (Core Principle)

Every analytical finding links back through an unbroken evidence chain:

AnalysisFinding
  └─▶ Category (AI-grouped theme)
       └─▶ CodingUnit (verbatim extract + paraphrase + generalization)
            └─▶ Message (original participant response)
                 └─▶ Interview (session with participant)
                      └─▶ Agent (configured interviewer)

No finding exists without traceable evidence. No quote is fabricated.

Database Tables (10 total)

TablePurpose
personasReusable personality profiles
agentsExecutable AI interviewers with LLM config
question_guidesOrdered interview questions per agent
session_templatesMulti-agent session blueprints
session_template_agentsAgent–session mapping with roles
interviewsIndividual interview instances
messagesAll interview messages (participant + agent)
voice_sessionsVoice-specific session metadata
analysis_runsQCA analysis instances with config
coding_unitsSegmented message extracts
categoriesThematic groupings of coding units
analysis_findingsSynthesized research findings
interview_invitationsShareable invitation tokens
audit_logState-change audit trail

2. Persona Library

What Personas Are

Personas are reusable personality profiles that serve as templates for creating agents. Each persona encapsulates a consistent interviewer personality — name, description, system prompt, voice settings, language, and behavioral rules. Instead of configuring each agent from scratch, admins pick a persona and "instantiate" it into a ready-to-configure agent.

Persona Fields

FieldTypePurpose
keyString (unique)Slug identifier (e.g., "system_autopsy")
nameStringDisplay name (e.g., "Dr. Vera Kessler")
personality_descriptionTextRich personality profile
system_prompt_templateTextBase system prompt instructions
language_defaultStringDefault interview language ("en", "de")
voice_descriptionTextDescription of voice tone/style
default_voice_idStringElevenLabs voice ID
best_forJSON arrayTags like ["architecture", "technical"]
engagement_rulesJSONCustom rules (max_follow_ups, probe_on_vague, etc.)

Admin Workflow: Persona → Agent

  1. Browse Personas at /personas — grid view with personality previews, best_for tags, and language badges.
  2. Click "Instantiate" — opens a modal asking for welcome message and optional interview type.
  3. System creates an Agent pre-configured with the persona's personality, prompt, voice, and language.
  4. Redirects to agent edit page (/agents/{id}/edit) where the admin fine-tunes questions, model parameters, etc.

DIAGNOSE Phase Personas

The platform ships with 5 DIAGNOSE dimension personas:

PersonaKeyFocusStyle
Dr. Vera Kesslersystem_autopsyTechnical landscape, legacy systems, vendor lock-inMethodical, precise, analytical
James Okaforcapacity_auditTime allocation, automation potential, work-about-workData-driven, direct, efficient
Dr. Miriam Steinfeldorganizational_autopsyDecision flows, org structure, culture patternsEmpathetic, insightful, structured
Raj Ananddata_intelligenceData maturity, AI readiness, information architectureCurious, systematic, technically deep
Ingrid Larsenprocess_zero_testCore processes, failure modes, handoff patternsPragmatic, thorough, structured

3. Agent Configuration

What Agents Are

Agents are executable interview entities — they are what actually conducts the conversation with participants. Each agent has independently configurable LLM parameters, interview behavior, voice settings, and a set of ordered questions.

Configurable Parameters

ParameterDefaultPurposeImpact on Behavior
model_nameclaude-sonnet-4-6Which Claude model to useSonnet: balanced. Opus: deeper reasoning. Haiku: faster, cheaper.
temperature0.7LLM sampling temperature (0.0–1.0)Lower = more predictable/focused. Higher = more creative/varied.
max_tokens1024Response length limitConstrains how long the agent's responses can be
system_prompt(from persona)Core personality and instructionsDefines the agent's entire conversational identity
welcome_message(required)First message to participantSets the tone for the entire interview
closing_message"Thank you..."Final messageWraps up the interview gracefully
language"en"Interview languageAffects prompt modules (German/English specific rules)
voice_id(optional)ElevenLabs voice IDWhich voice speaks in voice interviews
turn_taking_profile"balanced"Voice conversation patienceresponsive (fast) → balancedpatienttherapist (very patient)
interview_type(optional)Classification tagE.g., "architecture", "culture", "technical"
dimension_config(optional)Custom tracking dimensionsEnables visual dimension tracking in the UI

How Agent Settings Affect Behavior (H3)

The same interview questions asked by two differently configured agents produce measurably different results:

  • Temperature 0.3 agent: Sticks closely to question guide, minimal creative probing, consistent responses across interviews.
  • Temperature 0.9 agent: More varied follow-ups, creative analogies, sometimes surprising questions.
  • Sonnet model: Balanced depth and speed, good for most interviews.
  • Opus model: Deeper reasoning, better at detecting contradictions and nuance, slower.
  • "therapist" turn-taking: Waits up to 8 seconds before speaking in voice, allows long pauses for reflection.
  • "responsive" turn-taking: Takes over after 2 seconds of silence, keeps pace high.

Question Guides

Each agent has an ordered list of question guides — the interview curriculum.

FieldPurpose
question_textThe question to ask
order_indexPosition in sequence (0, 1, 2, ...)
probing_depthHow many follow-up rounds (0–5). Default: 2
is_requiredMust be asked before interview can complete
topic_labelDimension/topic name for tracking (e.g., "Architecture")

Probing Depth controls how the interview engine follows up:

DepthBehavior
0No follow-ups. Move to next question after any response.
1One round of clarification.
2Two rounds of probing (default). Good for most questions.
3–5Deep exploration. Agent probes extensively before advancing.

Topic Labels connect questions to the agent's dimension_config for visual tracking. When a question has topic_label: "Architecture", the UI shows progress on the "Architecture" dimension as that topic is covered.

Admin Workflow: Configure Agent

  1. Navigate to /agents/{id}/edit
  2. Review/edit system prompt, welcome/closing messages
  3. Adjust LLM model, temperature, max tokens
  4. Set voice ID (for voice interviews)
  5. Set turn-taking profile
  6. Add/reorder/edit question guides (with topics and probing depths)
  7. Save

4. Session Templates & Multi-Agent Orchestration

What Sessions Are

Session Templates are multi-agent interview blueprints that combine two or more agents in a specific configuration. They define:

  • Which agents participate and in what roles
  • How agents switch (orchestration mode)
  • What topics each agent covers
  • How interview time is allocated

Orchestration Modes

ModeDescription
tag_teamAgents dynamically switch based on topic relevance, sentiment, and coverage. The system scores all agents each turn and selects the best fit.
sequentialAgents take turns in order — Agent 1 covers their questions, then Agent 2, etc.

Agent Roles

Each agent in a session has a role:

RolePurpose
leadPrimary interviewer. Starts the conversation. Gets majority of time.
supportSpecialist. Takes over when their expertise is needed.

Session Template Fields

FieldPurpose
nameTemplate name (e.g., "DIAGNOSE: System Autopsy")
descriptionWhat this session investigates
orchestration_modetag_team or sequential
languageInterview language
target_duration_minutesExpected duration (default: 30)
interview_typeClassification tag

Agent Assignment in Sessions

Each agent entry in a session has:

FieldPurpose
agent_idWhich agent
rolelead or support
order_indexSpeaking order
assigned_topicsTopics this agent covers (e.g., ["Architecture", "Infrastructure"])
time_allocation_pctPercentage of interview time (e.g., 70%)

Multi-Agent Handoff: How It Works

In tag_team mode, the system evaluates which agent should respond on every turn. The selection uses a 5-dimension weighted scoring algorithm:

DimensionWeightWhat It Measures
Topic Match35%Does this agent own the current topic?
Sentiment Fit25%Is this agent's style appropriate for the participant's emotional state?
Continuation Bonus20%Is the current agent still probing a topic? (Avoid disruptive handoffs mid-thread.)
Balance Penalty15%Has this agent had too many turns relative to others? (Prevents monopolization.)
Question Readiness15%Does this agent have uncovered topics in their domain?

Scoring Details

Topic Match (0.0–1.0):

  • Exact topic match → 1.0
  • Generalist (no assigned topics) → 0.5
  • No match → 0.0

Sentiment Fit (0.0–1.0):

  • Negative participant sentiment + empathetic agent → up to 1.0
  • Negative sentiment + analytical agent → as low as 0.1
  • Neutral/positive → 0.5 (any style works)

Continuation Bonus (0.0–1.0):

  • Same agent, still probing (follow-ups < probing_depth) → bonus (decays toward 0)
  • Different agent or probing exhausted → 0.0

Balance Penalty (0.0 to -1.0):

  • Agent has had fair share of turns → 0.0
  • Agent dominating (e.g., 30%+ above fair share) → -0.5 to -1.0

Question Readiness (0.0–1.0):

  • Agent has uncovered topics in their domain → proportional score
  • All their topics asked → 0.0

Handoff Logging

Every agent switch is logged with full scores:

{ "from_agent": "uuid-of-previous-agent", "to_agent": "uuid-of-new-agent", "turn": 5, "scores": { "agent-1-uuid": {"total": 0.62, "topic_match": 1.0, "sentiment_fit": 0.5, "...": "..."}, "agent-2-uuid": {"total": 0.58, "topic_match": 0.0, "sentiment_fit": 0.8, "...": "..."} } }

This creates a complete audit trail of why each handoff decision was made.

Admin Workflow: Create Session

  1. Navigate to /sessions/new
  2. Enter session name, description, language, duration, interview type
  3. Select orchestration mode (tag_team or sequential)
  4. Add agents: select from available agents, assign roles, set assigned topics, allocate time %
  5. Save → session template is ready for launching interviews

5. Interview Lifecycle

Full Lifecycle

CREATE ──▶ INVITE ──▶ CONSENT ──▶ START ──▶ MESSAGES ──▶ COMPLETE

Step 1: Create Interview

From the session detail page (/sessions/{id}), click "+ Participant":

  1. Enter participant name, role, department (optional)
  2. Select modality: Text (chat) or Voice (real-time audio)
  3. Optionally override duration
  4. Click "Create Invitation"

The system creates an Interview record (status: created) and generates an invitation.

Step 2: Invite Participant

After creation, the modal shows:

  • Copy Link: A shareable URL like https://frontend.example.com/i/abc123token
  • Send Email: Enter email address → system sends branded HTML email via Azure Communication Services or SMTP

Invitation tokens expire after 30 days. Each access increments a counter for tracking.

When the participant opens the link:

  1. Token is validated (not expired, interview not completed)
  2. If name was provided during invitation, a personalized welcome is shown
  3. If no name, a name input field appears
  4. Consent text explains: modality (text/voice), expected duration, recording notice
  5. Participant clicks "Begin Interview"

Step 4: Start Interview

  • Status changes to in_progress
  • started_at timestamp is set
  • Agent's welcome message is posted as the first message
  • For voice: WebSocket connection established, welcome message synthesized via TTS

Step 5: Message Exchange

Each turn follows the 5-node pipeline (see Section 6):

  1. Participant sends message
  2. Pacing node determines interview phase (rapport/exploration/close)
  3. Route node selects best agent (in multi-agent sessions)
  4. Agent node generates response via Claude
  5. QA node validates quality and tracks coverage
  6. Response delivered to participant

The system tracks:

  • Question coverage: which questions from the guide have been addressed
  • Topic coverage: progress per dimension/topic
  • Sentiment: per-message sentiment scoring
  • Turn count: total exchanges

Step 6: Complete Interview

Interview completes when:

  • The AI determines all required questions are covered and time is approaching, OR
  • The participant has answered enough for meaningful analysis, OR
  • The admin manually ends the interview

Status changes to completed, completed_at is set, duration_seconds calculated.

Three-Act Interview Structure

The interview follows a cinematic three-act structure driven by elapsed time and coverage:

ActTime RangeFocus
Rapport0–20%Build trust. Focus on the person, not the topic. Light questions.
Exploration20–80%Deep dive. Follow the participant's energy. Use the question guide as a map, not a script.
Close80–100%Wrap up. Cover remaining required topics. Thank participant.

Act transitions can be modified by:

  • Negative sentiment early: Extends rapport phase to build trust
  • High coverage early: Moves to close earlier
  • Low coverage late: Stays in exploration past 80%

6. AI Decision-Making in Interviews

The 5-Node LangGraph Pipeline

Every participant message flows through a 5-node processing pipeline:

PARTICIPANT MESSAGE
       │
       ▼
┌──────────────┐
│  1. PACING   │  Determines: rapport / exploration / close
│    NODE      │  Inputs: elapsed time, coverage, sentiment
└──────┬───────┘
       │
       ▼
┌──────────────┐
│  2. ROUTE    │  Selects: which agent responds
│    NODE      │  Inputs: topic, sentiment, agent scores
└──────┬───────┘
       │
       ▼
┌──────────────┐
│  3. AGENT    │  Generates: the actual response
│    NODE      │  Inputs: system prompt, conversation history
└──────┬───────┘
       │
       ▼
┌──────────────┐
│  4. QA       │  Validates: response quality
│    NODE      │  Checks: anti-patterns, length, engagement
└──────┬───────┘
       │
       ▼
┌──────────────┐
│  5. OUTPUT   │  Formats: API response
│    NODE      │  Returns: message, coverage, metadata
└──────────────┘

Node 1: Pacing — Act Selection

What the AI decides: When to transition between rapport, exploration, and close.

The pacing node computes elapsed_ratio (0.0–1.0) based on (now - started_at) / target_duration_minutes, then applies rules:

  • elapsed < 0.20 → Rapport
  • elapsed < 0.25 AND negative sentiment → Extended Rapport (build trust first)
  • 0.20 ≤ elapsed < 0.80 → Exploration (unless coverage > 90% → early Close)
  • elapsed ≥ 0.80 → Close (unless coverage < 40% → stay in Exploration)

Node 2: Route — Agent Selection

What the AI decides: Which agent should respond (in multi-agent sessions).

For single-agent interviews, this is a passthrough. For multi-agent sessions, the 5-dimension scoring algorithm described in Section 4 is applied. The agent with the highest weighted score is selected. If a different agent is selected, a handoff is logged.

Node 3: Agent — Response Generation

What the AI decides: What to say to the participant.

The agent node constructs a comprehensive system prompt (V2 Canonical) and calls Claude. The prompt includes:

7 Golden Rules

  1. This is a conversation: Acknowledge → Bridge → Advance. Reference what they said specifically.
  2. Be concise: 2–3 sentences max (never > 4). After a long answer: 1 sentence only. Questions ≤ 15 words.
  3. Be genuinely curious: React to surprises. Pull threads from passing remarks.
  4. Feel what they feel: Negative sentiment → acknowledge frustration first. Never respond to emotion with a factual question.
  5. Remember everything: Reference earlier conversation naturally. Spot contradictions.
  6. Use mirroring: Paraphrase core points. Max 1 mirror per 2–3 turns.
  7. Respect time and space: Long pauses → wait. Interesting tangent → follow it. Short answer → probe once, then move on.

Anti-Patterns ("What You Never Do")

The system prompt explicitly forbids:

  • Reading questions verbatim from the guide
  • Saying "That's a great point" or "Great question"
  • Hinting the agent is artificial
  • Lecturing or teaching
  • Reusing the same probe
  • Asking two questions per turn
  • Using bullet points/lists while speaking
  • Saying "Let's move on" or "Next topic"
  • Front-loading with excessive context
  • Ignoring emotional content for data
  • Being sycophantic ("Wow!", "Brilliant!")

Response Format

The LLM returns structured JSON:

{ "agent_response": "The actual message to the participant", "sentiment": { "score": 0.3, "label": "positive" }, "interview_complete": false }

Node 4: QA — Quality Validation

What's checked (deterministic):

  • Word count: > 30 words (text) or > 20 words (voice) → flagged as too_long
  • Question count: > 1 question mark → flagged as multiple_questions
  • Sentence count: > 6 (text) or > 4 (voice) → flagged as too_many_sentences
  • Empty response: → flagged as empty_response
  • Coverage tracking: Updates which topics have been covered vs remaining

AI vs Deterministic Decisions (Interviews)

DecisionMade ByEvidence
Act transition (rapport/exploration/close)Deterministic rules + AI overrideElapsed time, coverage %, sentiment score
Which agent respondsWeighted scoring algorithm5-dimension scores logged per handoff
Response contentAI (Claude)System prompt + conversation history
When to ask follow-up vs advanceAI (Claude)Question guide + probing depth config
Sentiment scoringAI (Claude)Per-message -1.0 to +1.0 score
Interview completionAI (Claude) signals, deterministic checkCoverage + time + AI judgment
Word/sentence limit checkDeterministicCharacter counting
Anti-pattern detectionDeterministicRegex + counting
Coverage calculationDeterministicquestion_guide_id on messages
Turn countingDeterministicsequence_number

7. Voice Pipeline

Architecture

Voice interviews use a real-time bidirectional pipeline:

Participant's Microphone
       │
       ▼ (WebSocket: binary audio chunks)
┌──────────────┐
│  Deepgram    │  STT: Audio → Text (streaming, ~500ms)
│  Nova-2      │  Interim + final transcripts
└──────┬───────┘
       │
       ▼ (accumulated transcript)
┌──────────────┐
│  Turn-Taking │  5-signal fusion → "participant done speaking?"
│  Engine      │  Threshold-based decision
└──────┬───────┘
       │ (when turn taken)
       ▼
┌──────────────┐
│  Claude LLM  │  Same interview engine as text (~1500ms)
│              │  Returns structured JSON response
└──────┬───────┘
       │
       ▼
┌──────────────┐
│  ElevenLabs  │  TTS: Text → Audio (streaming MP3, ~1000ms)
│              │  Agent's voice
└──────┬───────┘
       │
       ▼ (WebSocket: binary audio)
Participant's Speakers

Target latency: < 4 seconds round-trip (STT ~500ms + LLM ~1500ms + TTS ~1000ms + network ~500ms).

Turn-Taking Engine

The turn-taking engine decides when the participant has finished speaking. It fuses 5 signals to avoid premature interruption or excessive waiting:

Signal 1: VAD Silence Duration (35% weight)

Measures how long since the participant stopped speaking:

SilenceScoreInterpretation
< 200ms0.0Breathing pause
200–400ms0.1Normal inter-sentence
400–700ms0.3Could be done
700–1000ms0.5Likely done
1000–1500ms0.7Probably done
1500–2500ms0.85Almost certainly
> 2500ms0.95Certainly done

Signal 2: Audio Energy Decay (15% weight)

Analyzes the trajectory of audio energy before silence:

PatternScoreMeaning
Strong downward slope0.85Natural trailing off — yielding turn
Gentle decline0.65Winding down
Flat energy0.40Uncertain — mid-thought pause
Rising energy0.15Was building momentum — interrupted

Signal 3: Transcript Completeness (25% weight)

Rule-based analysis of the transcript text:

  • Sentence-ending punctuation (., !, ?): +0.30
  • Trailing incomplete markers ("and", "but", "because"): -0.25
  • Short known-complete answers ("yes", "no", "I agree"): +0.30
  • Listing patterns without closure: -0.15

Signal 4: Semantic Sufficiency (25% weight)

Uses Claude Haiku (fast, cheap) to assess if the response is complete:

  • Gated: Only called when silence > 500ms AND word count ≥ 3
  • Prompt: "Is this response to the question complete?"
  • Returns confidence 0.0–1.0

Signal 5: Context Adjustment (threshold modifier)

Modifies the decision threshold based on question type:

  • Closed question (yes/no type): -0.10 threshold (trigger faster)
  • Open question ("tell me about..."): +0.12 threshold (wait longer)
  • Long response (> 30 seconds): -0.05 (person winding down)

Patience Profiles

ProfileBase ThresholdMax SilenceMin SilenceUse Case
responsive0.482000ms200msFast-paced interviews
balanced0.553000ms300msDefault — most interviews
patient0.655000ms500msReflective participants
therapist0.758000ms700msDeep emotional topics

Barge-In Handling

When the participant starts speaking while the agent is talking:

  1. Client detects sustained speech energy (> 300ms above threshold)
  2. Sends {"type": "barge_in"} via WebSocket
  3. Server cancels ongoing turn evaluation and TTS playback
  4. Resets energy buffer
  5. Ready for new speech input

A 1200ms grace period after TTS starts prevents echo from triggering false barge-in.


8. Mayring QCA Analysis

Overview

The analysis engine implements Mayring's Qualitative Content Analysis — a systematic method for deriving categories and findings from qualitative data. The engine runs as a background task with three sequential phases.

Pipeline

PENDING ──▶ PHASE 1: CODING ──▶ PHASE 2: CATEGORIZING ──▶ PHASE 3: FINDINGS ──▶ COMPLETED

Phase 1: Coding (Segmentation)

Goal: Break every participant message into discrete coding units.

For each participant message, Claude:

  1. Segments the message into coding units (each = one distinct idea, 1–3 sentences)
  2. Extracts the verbatim text with character positions (start_char, end_char)
  3. Paraphrases the extract (removes filler, restates core meaning)
  4. Generalizes the meaning (abstracts to configurable level: low/medium/high)
  5. Scores sentiment (-1.0 to +1.0)

Example:

FieldValue
Original message"Well, our deployment pipeline is really slow, like 45 minutes per build. And honestly nobody trusts the test suite anymore because it's been broken for months."
Coding Unit 1
coded_text"our deployment pipeline is really slow, like 45 minutes per build"
paraphrase"Deployment pipeline takes 45 minutes per build"
generalization"CI/CD pipeline performance bottleneck"
sentiment_score-0.4
Coding Unit 2
coded_text"nobody trusts the test suite anymore because it's been broken for months"
paraphrase"Test suite is untrusted due to months of failures"
generalization"Test infrastructure credibility loss"
sentiment_score-0.6

Progress: Commits after each message, so the polling endpoint shows real-time progress: "Coding messages 5/21 — 32 coding units extracted"

Phase 2: Categorizing

Goal: Build a category system from the coding units.

Three modes are available:

Inductive Mode (bottom-up)

The AI discovers categories naturally from the data:

  1. First reduction: Groups similar generalizations, removes redundancy
  2. Second reduction: Builds umbrella categories from groups
  3. Assignment: Each coding unit assigned to exactly one category
  4. Target: 5–20 categories

For each category, the AI generates:

  • Name: Descriptive category label
  • Definition: What belongs in this category
  • Coding rule: Include when... / Exclude when...
  • Anchor example: The most prototypical quote

Deductive Mode (top-down)

Admin provides predefined categories. The AI assigns each coding unit to the best-fit category:

  • Exact match → assigned to predefined category
  • No fit → assigned to "Uncategorized"

Mixed Mode (hybrid)

Combines both approaches:

  1. Try to fit each coding unit to predefined categories
  2. Units that don't fit → group into emergent categories
  3. Small emergent groups (below min_frequency) → merge into "Other"
  4. Each category is marked is_deductive: true/false

Phase 3: Findings (Synthesis)

Goal: Generate analytical findings from the category system.

The AI identifies patterns across categories and generates findings of five types:

Finding TypeIconDescription
themeA recurring pattern across multiple participants
contradictionConflicting positions from different participants
consensusStrong broad agreement across participants
outlierA unique or surprising finding from one participant
recommendationAn actionable suggestion grounded in the data

Evidence Strength Rules

StrengthCriteria
strong≥ 5 supporting coding units from ≥ 3 different interviews
moderate3–4 units from ≥ 2 interviews
weak< 3 units

Each finding includes:

  • supporting_category_ids: Which categories support it (UUID references)
  • supporting_quote_ids: Key verbatim quotes (UUID references to coding units)
  • participant_count: How many unique interviews contributed

Analysis Configuration

When creating an analysis run, the admin configures:

{ "name": "System Autopsy — Technical Assessment", "interview_ids": ["uuid1", "uuid2", "uuid3"], "config": { "mode": "inductive", "abstraction_level": "medium", "min_frequency": 2, "include_sentiment": true, "deductive_categories": [] } }
Config FieldPurpose
modeinductive, deductive, or mixed
abstraction_levellow (concrete), medium (balanced), high (abstract)
min_frequencyMinimum coding units per category in mixed mode
include_sentimentWhether to compute sentiment scores
deductive_categoriesPredefined categories (for deductive/mixed mode)

Requirement: Minimum 3 completed interviews to run an analysis.

AI vs Deterministic Decisions (Analysis)

DecisionMade ByEvidence
Where to segment message into unitsAI (Claude)Character positions verified against source
Paraphrase contentAI (Claude)Generated per coding unit
Generalization contentAI (Claude)Abstraction level configurable
Sentiment score per unitAI (Claude)-1.0 to +1.0
Category names and definitionsAI (Claude)Generated from data (inductive) or matched (deductive)
Category assignmentAI (Claude indices) → deterministic UUID mappingLLM returns index, system maps to UUID
Finding type classificationAI (Claude)Theme/contradiction/consensus/outlier/recommendation
Evidence strength ratingAI (Claude) with rulesMust meet count/interview thresholds
Supporting quotes selectionAI (Claude indices) → deterministic UUID mappingLLM returns index, system maps to UUID
Traceability verificationDeterministiccoded_text substring check against message
Category frequency countDeterministicCount of assigned coding units
Average sentiment per categoryDeterministicArithmetic mean of member units
Status transitionsDeterministicpending → coding → categorizing → completed
Progress reportingDeterministicMessage count, unit count, category count

JSON Parsing Robustness

Claude sometimes returns JSON followed by commentary text. The parser handles this:

  1. Try json.loads() (clean JSON)
  2. Strip markdown fences (```json ... ```)
  3. Use json.JSONDecoder.raw_decode() to find first JSON object
  4. Fall back: treat entire message as one coding unit

9. Quality & Monitoring

Anti-Pattern Detectors (10 total)

The quality engine runs 10 detectors on every agent response:

#DetectorWhat It CatchesMethod
1Leading Questions"Don't you think...", "Isn't it obvious..."Regex pattern matching
2Multiple QuestionsMore than one question per turnCount "?" characters
3Excessive MirroringParaphrasing too frequentlyTrack mirroring in last 4 responses
4Topic RepetitionReturning to already-covered topicCheck current_topic in covered list
5Ignoring EmotionFactual response to emotional contentDetect absence of empathy patterns when sentiment is negative
6Premature Topic SwitchMoving on before adequate probingfollowup_count < probing_depth AND count = 0
7Too-Long ResponseAgent talking too muchWord count > 30 (text) or > 20 (voice)
8Jargon OverloadExcessive business buzzwordsCount jargon words ("synergistic", "leverage", etc.)
9Missed Follow-upIgnoring rich participant responseParticipant gave 15+ emotional words but agent didn't follow up
10Generic AcknowledgmentBland non-specific responseResponse is 1–8 words matching generic patterns

IEI (Interview Engagement Index)

A composite 0.0–1.0 score computed from 6 weighted components:

ComponentWeightCalculation
Coverage Score25%coverage_pct / 100
Engagement Depth20%min(avg_participant_words / 50, 1.0)
Anti-Pattern Penalty20%max(1.0 - anti_pattern_count / 10, 0.0)
Mirroring Bonus10%1.0 (sweet spot: 1–3 mirrors), 0.6 (4–5), 0.3 (0 or 6+)
Sentiment Trajectory15%Improving sentiment over time → higher score
Turn Adequacy10%1.0 for 5–15 turns (ideal range), scaled outside

Example: 8 turns, 60% coverage, avg 35 words, 2 anti-patterns, 2 mirrors, improving sentiment → IEI = 0.72

Live Session Monitor

Real-time monitoring endpoints for active interviews:

EndpointReturns
GET /monitor/{id}/stateCurrent act (rapport/exploration/close), elapsed ratio, turn count, sentiment, anti-pattern flags
GET /monitor/{id}/checkpointsTurn-by-turn history (act + sentiment per turn)
GET /monitor/{id}/metricsPer-agent performance: turn count, avg response words, topics covered
GET /monitor/{id}/messagesFull conversation with agent attribution

Engagement Scorecard

Aggregates all quality metrics into a single report per interview:

  • IEI score (0.0–1.0)
  • List of anti-pattern flags detected
  • Engagement summary: turn count, coverage %, average participant words, mirroring count
  • Sentiment trend: improving / declining / stable

10. Evidence Explorer & Dashboard

Analysis Dashboard

After an analysis completes, the dashboard (/analysis/{id}) shows four panels:

Panel 1: Category Overview

  • Horizontal bar chart of all categories, sized by frequency (coding unit count)
  • Color-coded by average sentiment (green = positive, red = negative, orange = neutral)
  • Deductive categories marked with "D" badge
  • Click any category → drill-down showing definition, anchor example, coding rule, and link to evidence

Panel 2: Findings

  • Card list of all findings, grouped by type (theme, contradiction, consensus, outlier, recommendation)
  • Each card shows: type icon, title, evidence strength badge
  • Click to expand: full description, participant count, supporting quote count, link to evidence chain

Panel 3: Sentiment Overview

  • Overall average sentiment score with visual gauge (-1.0 to +1.0)
  • Sentiment by category: sorted list showing per-category averages
  • Color-coded (positive = green, negative = red)

Panel 4: Traceability Health

  • Health indicator: checkmark (100% traceable) or warning (gaps detected)
  • Statistics: total findings, categories, coding units, interviews
  • Evidence strength breakdown: bar chart showing strong / moderate / weak counts
  • Link to full Evidence Explorer

Evidence Explorer

The Evidence Explorer (/analysis/{id}/evidence) provides the drill-down interface:

  • Finding → Categories: Which categories support this finding
  • Category → Coding Units: All verbatim extracts assigned to this category
  • Coding Unit → Message: The original participant message with the extract highlighted
  • Message → Interview: Which participant, when, in what context

Every level shows the full context needed to evaluate whether the finding is well-supported.


11. Deployment & CI/CD

Production Environment (Azure)

ComponentResource
BackendAzure Container App: iaas-backend (port 8000)
FrontendAzure Container App: iaas-frontend (port 3000)
DatabaseAzure PostgreSQL Flexible Server: ironcode-flex-server
Container Registryironcode.azurecr.io
RegionNorth Europe

CI/CD Pipeline

Triggered on every push to main:

Push to main
    │
    ├──▶ Build backend Docker image → Push to ACR
    ├──▶ Build frontend Docker image → Push to ACR
    │
    ▼
Deploy backend container
    │ (auto-runs: alembic upgrade head)
    ▼
Seed demo data
    │ (POST /admin/seed → truncate + re-seed)
    ▼
Deploy frontend container
    │
    ▼
Smoke tests
    │ (health checks on both services)
    ▼
Done

Database Migrations

Migrations run automatically when the backend container starts (start.sh):

#!/bin/bash alembic upgrade head # Apply pending migrations uvicorn app.main:app --host 0.0.0.0 --port 8000 --workers 2

Seed Data

The POST /api/v1/admin/seed endpoint (requires admin password):

  1. Truncates all tables (CASCADE)
  2. Runs the DIAGNOSE seed: 5 personas, 12 agents, 6 session templates, 18 interviews, 302 messages, 1 completed analysis

Appendix: API Response Format

All API responses follow a consistent envelope:

Success:

{ "data": { "...": "..." }, "meta": { "request_id": "uuid", "timestamp": "2026-03-03T10:00:00Z" } }

Error:

{ "error": { "code": "ERROR_CODE", "message": "Human-readable description", "details": { "...": "..." } }, "meta": { "request_id": "uuid", "timestamp": "2026-03-03T10:00:00Z" } }

Document generated March 2026. IaaS PoC v0.2.0 by IRONCODE.