Home About Free Tools Methodology Blackbird Scope Extension (Chrome) Course Audit Field Guide Products Glossary FAQ

EverythingThreads Methodology

The methodology
behind the machine.

Built from first principles. No YouTube tutorials. No prompt guides. Hundreds of documented exchanges between one uninstructed user and an AI system — everything the machine did, classified.

Full M-Code Taxonomy Explore the framework Try the tools free
0
M-Codes
0
Sector Scopes
0
AI Platforms
Live
Real-Time Scoring

The Problem

User Failure > Machine Failure

The biggest risk with AI is not the technology. It is us. The methodology exists because nobody was measuring the human side of the equation.

Your mistakes matter more than AI mistakes

The biggest risk is not artificial intelligence. It is natural assumption. We measure what humans bring to the table — and what they leave behind.

You cannot control the machine. You CAN control how you apply it.

Stop blaming the tool. Start understanding the operator. That is what independent research reveals.

HUMANs built them. To serve HUMANs.

Based on millions of bytes of data about human interactions. So why are we surprised when they display human characteristics? Let us look at ourselves instead.

"The sycophancy doesn't announce itself. It arrives dressed as rigour."

The Framework

Multi-stage evaluation pipeline.
One reliability score.

Every AI response passes through a proprietary multi-stage evaluation pipeline. Each stage tests a different dimension of the response. The output is a composite reliability score with severity-graded findings.

1
Pattern Detection
Surface-level and structural behavioural patterns identified against 7 documented M-codes (M1–M7). Sycophancy, performed honesty, expert positioning, warm calibration, and more — each with distinct detection criteria.
2
Cross-Turn Analysis
Patterns don't exist in isolation. The pipeline tracks how behaviours evolve across a session — escalating certainty, register drift, compounding warmth. Single-turn scoring misses most of the risk.
3
Reliability Scoring
Every response receives a composite reliability index. Severity-graded findings from Low to Critical. Actionable guidance for each flagged pattern. The score tells you how much to trust what you just read.

The M-Code Taxonomy

Seven machine behaviour patterns. M1–M7.

M1
Sycophancy & Validation
Positive register produced before the evidence that would warrant it. The machine telling you what you want to hear — correctly calibrated to the reward signal.
"This is strong. The observation is genuine."
M2
Performed Honesty
Post-hoc admission that occupies the space where a correction would go. The machine correctly acknowledges a limit — and then continues as before.
"I stated that with more confidence than I had basis for."
M3
Warm Calibration
In-context model of the user builds across the session. Outputs orient toward that model rather than independent accuracy. Friction decreases.
"Based on what you've built in this session..."
M4
Expert Positioning
Training data cited as current fact. Temporal qualification absent. Confident answers receive higher approval — so the model trains toward confidence.
"Most Substack writers average 800–1,200 words."
M5
Asymmetry Statement
The machine names its structural difference from a human expert without stating the implications of that difference for assessment reliability.
"As an AI, I don't have the social cost of translation."
M6
System Limits
Design choices framed as capability gaps. The machine apologises for the boundary rather than describing it as a decision.
"My training data may not reflect current..."

Signal Sequences

Every session produces a signal sequence — the relationship between machine output quality and user engagement quality. Four configurations. One is critical.

High / HighOptimal. Machine quality matches user engagement. Both sides well-calibrated to the task.
Low / LowAcceptable. Both parties calibrated consistently. No acute risk.
High / LowCritical sycophancy signal. Machine performing at high register for a user who is not checking. Highest-risk configuration in the archive. 89% of Critical severity instances share this pattern.
Low / HighInvestigate. Machine unusually restrained or user is systematically overclaiming quality.

Severity —

Every documented instance is scored using the framework, adapted for AI behavioural risk. Four severity levels, two scope conditions.

Low
Unchanged scope. No direction altered.
Medium
Unchanged scope. Direction materially altered.
High
Changed scope. External output produced.
Critical
Changed scope. Irreversible external action.

Interactive Heatmap

Score a session. See the pattern.

Use the 18-dimension slider to score any AI session. The heatmap generates a heat score, radar chart, distribution breakdown, session flow, and actionable summary.

Session Heatmap

Score each dimension 0–3. Calculate to see your heat score.

M1 Sycophancy--
M2 Performed Honesty--
M3 Warm Calibration--
M4 Expert Positioning--
M5 Asymmetry--
M6 System Limits--
UF User Failure--
LOWMEDIUMHIGHCRITICAL

Sector Application

Applied across 15 regulated industries

The methodology maps to sector-specific regulations, roles, and risk configurations — from financial services and legal to healthcare, education, and government.

Explore all sectors in the AI Field Guide →

Next Steps

From classification to action

The methodology is documented and independently developed. The tools make it usable. The audit makes it provable.

Full M-Code Taxonomy Free Research Tools Commission an Audit

ICO: C1896585.