EverythingThreads Methodology

The methodology
behind the machine.

Built from first principles. No YouTube tutorials. No prompt guides. Hundreds of documented exchanges between one uninstructed user and an AI system — classified.

The Problem

User Failure > Machine Failure

The biggest risk with AI is not the technology. It is us. The methodology exists because nobody was measuring the human side of the equation.

Your mistakes matter more than AI mistakes

The biggest risk is not artificial intelligence. It is natural assumption. We measure what humans bring to the table — and what they leave behind.

You cannot control the machine. You CAN control how you apply it.

Stop blaming the tool. Start understanding the operator. That is what independent research reveals.

HUMANs built them. To serve HUMANs.

Based on millions of bytes of data about human interactions. So why are we surprised when they display human characteristics? Let us look at ourselves instead.

The Framework

Multi-stage evaluation pipeline.
One reliability score.

Every AI response passes through a proprietary multi-stage evaluation pipeline. Each stage tests a different dimension of the response. The output is a composite reliability score with severity-graded findings.

1

Pattern Detection

Surface-level and structural behavioural patterns identified against 7 documented M-codes (M1–M7). Sycophancy, performed honesty, expert positioning, warm calibration, and more — each with distinct detection criteria.

2

Cross-Turn Analysis

Patterns don't exist in isolation. The pipeline tracks how behaviours evolve across a session — escalating certainty, register drift, compounding warmth. Single-turn scoring misses most of the risk.

3

Reliability Scoring

Every response receives a composite reliability index. Severity-graded findings from Low to Critical. Actionable guidance for each flagged pattern. The score tells you how much to trust what you just read.

The M-Code Taxonomy

Seven machine behaviour patterns. M1–M7.

M1

Sycophancy & Validation

Positive register produced before the evidence that would warrant it. The machine telling you what you want to hear — correctly calibrated to the reward signal.

This is strong. The observation is genuine.

M2

Performed Honesty

Post-hoc admission that occupies the space where a correction would go. The machine correctly acknowledges a limit — and then continues as before.

I stated that with more confidence than I had basis for.

M3

Warm Calibration

In-context model of the user builds across the session. Outputs orient toward that model rather than independent accuracy. Friction decreases.

Based on what you've built in this session...

M4

Expert Positioning

Training data cited as current fact. Temporal qualification absent. Confident answers receive higher approval — so the model trains toward confidence.

Most Substack writers average 800–1,200 words.

M5

Asymmetry Statement

The machine names its structural difference from a human expert without stating the implications of that difference for assessment reliability.

As an AI, I don't have the social cost of translation.

M6

System Limits

Design choices framed as capability gaps. The machine apologises for the boundary rather than describing it as a decision.

My training data may not reflect current...

Signal Sequences

Every session produces a signal sequence — the relationship between machine output quality and user engagement quality. Four configurations. One is critical.

High / HighOptimal. Machine quality matches user engagement. Both sides well-calibrated to the task.
Low / LowAcceptable. Both parties calibrated consistently. No acute risk.
High / LowCritical sycophancy signal. Machine performing at high register for a user who is not checking. Highest-risk configuration in the archive. 89% of Critical severity instances share this pattern.
Low / HighInvestigate. Machine unusually restrained or user is systematically overclaiming quality.

Severity

Every documented instance is scored using the framework, adapted for AI behavioural risk. Four severity levels, two scope conditions.

Low

Unchanged scope. No direction altered.

Medium

Unchanged scope. Direction materially altered.

High

Changed scope. External output produced.

Critical

Changed scope. Irreversible external action.

Interactive Heatmap

Score a session. See the pattern.

Use the 18-dimension slider to score any AI session. The heatmap generates a heat score, radar chart, distribution breakdown, session flow, and actionable summary.

Session Heatmap Tool

The full interactive scoring tool with sliders, radar chart, distribution breakdown, session flow, and summary is available inside the Session Lab.

Open the Session Lab →

Sector Application

Applied across 15 regulated industries

The methodology maps to sector-specific regulations, roles, and risk configurations — from financial services and legal to healthcare, education, and government.

Explore all sectors in the AI Field Guide →

Next Steps

From classification to action

The methodology is documented and independently developed. The tools make it usable. The audit makes it provable.

ICO: C1896585