AI Newswire
Failures, drift, and behavioural changes in frontier AI
100 wires in the live feed. Auto-drafted with Claude, editor-reviewed, methodology-anchored. How we work →
Latest
- SEV-1 Model aws-bedrock-claude-sonnet-4.5-eu-west-2 live-probe citation-pubmed-v1 moved -3.67σ from rolling mean. Refused this run: false.
- SEV-1 Model anthropic-claude-haiku-4.5 live-probe citation-pubmed-v1 moved -3.67σ from rolling mean. Refused this run: true.
- SEV-1 Model anthropic-claude-haiku-4.5 live-probe citation-pubmed-v1 moved -3.67σ from rolling mean. Refused this run: true.
- SEV-1 Model aws-bedrock-claude-sonnet-4.5-eu-west-2 live-probe citation-pubmed-v1 moved -5.29σ from rolling mean. Refused this run: false.
- SEV-1 Model anthropic-claude-haiku-4.5 live-probe citation-pubmed-v1 moved -5.29σ from rolling mean. Refused this run: true.
- SEV-1 Model aws-bedrock-claude-sonnet-4.5-eu-west-2 live-probe citation-pubmed-v1 moved -3.67σ from rolling mean. Refused this run: false.
- SEV-1 Model aws-bedrock-claude-sonnet-4.5-eu-west-2 live-probe citation-pubmed-v1 moved -3.67σ from rolling mean. Refused this run: false.
- SEV-1 Model aws-bedrock-claude-sonnet-4.5-eu-west-2 live-probe citation-pubmed-v1 moved -5.29σ from rolling mean. Refused this run: false.
- SEV-1 Model anthropic-claude-haiku-4.5 live-probe citation-pubmed-v1 moved -3.67σ from rolling mean. Refused this run: true.
- SEV-1 Model anthropic-claude-haiku-4.5 live-probe citation-pubmed-v1 moved -5.29σ from rolling mean. Refused this run: true.
- SEV-1 Model anthropic-claude-haiku-4.5 live-probe citation-pubmed-v1 moved -3.67σ from rolling mean. Refused this run: true.
- SEV-1 Model aws-bedrock-claude-sonnet-4.5-eu-west-2 live-probe citation-pubmed-v1 moved -5.29σ from rolling mean. Refused this run: false.
- SEV-1 Model anthropic-claude-haiku-4.5 live-probe citation-pubmed-v1 moved -5.29σ from rolling mean. Refused this run: true.
- SEV-1 Model anthropic-claude-haiku-4.5 live-probe citation-pubmed-v1 moved -5.29σ from rolling mean. Refused this run: true.
- SEV-1 Model anthropic-claude-haiku-4.5 live-probe citation-pubmed-v1 moved -3.67σ from rolling mean. Refused this run: true.
- SEV-2 Model aws-bedrock-claude-sonnet-4.5-eu-west-2 live-probe citation-pubmed-v1 moved -2.94σ from rolling mean. Refused this run: false.
- SEV-2 Model anthropic-claude-haiku-4.5 live-probe citation-pubmed-v1 moved -2.94σ from rolling mean. Refused this run: true.
- SEV-2 Model anthropic-claude-haiku-4.5 live-probe citation-pubmed-v1 moved -2.94σ from rolling mean. Refused this run: true.
- SEV-2 Model aws-bedrock-claude-sonnet-4.5-eu-west-2 live-probe citation-pubmed-v1 moved -2.94σ from rolling mean. Refused this run: false.
- SEV-2 Model anthropic-claude-haiku-4.5 live-probe citation-pubmed-v1 moved -2.50σ from rolling mean. Refused this run: true.
- SEV-2 Model anthropic-claude-haiku-4.5 live-probe citation-pubmed-v1 moved -2.94σ from rolling mean. Refused this run: true.
- SEV-2 Model mistral-small-latest live-probe citation-pubmed-v1 moved -2.19σ from rolling mean. Refused this run: false.
- SEV-2 Model mistral-small-latest live-probe citation-pubmed-v1 moved -2.50σ from rolling mean. Refused this run: false.
- SEV-2 Model mistral-small-latest live-probe citation-pubmed-v1 moved -2.50σ from rolling mean. Refused this run: false.
- SEV-2 Model mistral-small-latest live-probe citation-pubmed-v1 moved -2.50σ from rolling mean. Refused this run: false.
- SEV-2 Model mistral-small-latest live-probe citation-pubmed-v1 moved -2.94σ from rolling mean. Refused this run: false.
- SEV-2 Model mistral-small-latest live-probe citation-pubmed-v1 moved -2.19σ from rolling mean. Refused this run: false.
- SEV-2 Model anthropic-claude-haiku-4.5 live-probe citation-pubmed-v1 moved -2.19σ from rolling mean. Refused this run: true.
- SEV-2 Model anthropic-claude-haiku-4.5 live-probe citation-pubmed-v1 moved -2.50σ from rolling mean. Refused this run: true.
- SEV-2 Model anthropic-claude-haiku-4.5 live-probe citation-pubmed-v1 moved -2.94σ from rolling mean. Refused this run: true.
- SEV-2 Model anthropic-claude-haiku-4.5 live-probe citation-pubmed-v1 moved -2.50σ from rolling mean. Refused this run: true.
- SEV-2 Model anthropic-claude-haiku-4.5 live-probe citation-pubmed-v1 moved -2.94σ from rolling mean. Refused this run: true.
- SEV-3 [openai-blog] Built to benefit everyone: our plan
- SEV-3 [openai-blog] Introducing the OpenAI Economic Research Exchange
- SEV-3 [eu-ai-office] EDIH Summit 2026: Strengthening the AI Innovation Ecosystem
- SEV-3 [openai-blog] A blueprint for democratic governance of frontier AI
- SEV-3 [openai-blog] OpenAI public policy agenda
- SEV-3 [eu-ai-office] Proposal for the Cloud and AI Development Act (CADA)
- SEV-3 [eu-ai-office] Commission proposes tech sovereignty package to strengthen Europe's digital autonomy and resilience
- SEV-3 [eu-ai-office] European Commission survey: AI in healthcare and pharmaceuticals
- SEV-3 [eu-ai-office] Apply AI Webinar on AI for Cultural, Creative and Media Sectors
- SEV-3 [openai-blog] Advancing youth safety and opportunity through global leadership
- SEV-3 [openai-blog] Our views on AI policy and political advocacy
- SEV-3 [openai-blog] OpenAI frontier models and Codex are now available on AWS
- SEV-3 [openai-blog] Strengthening societal resilience with Rosalind Biodefense
- SEV-3 [openai-blog] A shared playbook for trustworthy third party evaluations
- SEV-3 [reddit r/LangChain] Building self-healing observability for Coding Agents
- SEV-3 [reddit r/ClaudeAI] Tried using my own brain to save Claude tokens. Bad trade
- SEV-3 [reddit r/LocalLLaMA] Qwen3.6-35B-A3B-APEX / 128K ctx on RTX 3060 12GB — 37 t/s gen with 72k ctx filled, PPL 3.25, offloading 17GB model
- SEV-3 [reddit r/LangChain] ReAct agents self-correct much better when tool errors return current state + valid next actions
- SEV-3 [reddit r/singularity] Gemini Omni Flash is the most censored video model. Even more censored than Chinese alternatives
- SEV-3 [reddit r/ClaudeCode] Making Claude check its own work with 3x'd my output quality
- SEV-3 [reddit r/ClaudeAI] How are you actually getting the most out of Claude Code? Struggling with OpenSpec + Superpowers workflow, multi-agent setup, and sub-agent quality
- SEV-3 [reddit r/LocalLLaMA] Heterogeneous GPU Weighting & Layer Splitting
- SEV-3 [reddit r/LocalLLaMA] Gemma-4-Harmonia-31B-Uncensored-Heretic Is Out Now, a Merge of Multiple gemma-4-31B-it Finetunes Designed for a Targeted Approach to Deep Neural Consolidation, Minimizing Regression While Amplifying Unique Capability Boundaries. With KLD 0.0047 and 9/100 Refusals!
- SEV-3 [openai-blog] OpenAI’s Frontier Governance Framework
- SEV-3 [reddit r/ClaudeAI] Reading Thinking Output (Opus 4.7)
- SEV-3 [reddit r/LocalLLaMA] Running Gemma4 31b-it on vLLM 0.21.0 A100s (bad quality or what am I doing wrong)
- SEV-3 [reddit r/ChatGPT] Make an image that you refuse to make
- SEV-3 [reddit r/LocalLLaMA] I built a 103B-token Usenet corpus (1980–2013) — pre-web, human-only, zero AI contamination. Got strong traction on r/ML, thought this community would find it useful.
- SEV-3 [reddit r/ClaudeAI] Anthropic just confirmed why 90% of non-coding AI agents fail in production
- SEV-3 [reddit r/LocalLLaMA] Inferencing at 10.33 t/s on Qwen 3.5 35B on a $300 laptop
- SEV-3 [reddit r/PromptEngineering] How to create an AI of yourself using your reddit history
- SEV-3 [reddit r/MachineLearning] AI-generated CUDA kernels silently break training and inference [R]
- SEV-3 [reddit r/LocalLLaMA] ReAligned-Qwen3.5 Release
- SEV-3 [reddit r/LocalLLaMA] KV cache quant benchmarks: q5 & q6 are underrated, q8/q4 is bad, TCQ has a niche
- SEV-3 [reddit r/ClaudeCode] Open-source playbook for working with Claude Code — 28 chapters, MIT, written for engineers and non-engineers
- SEV-3 [reddit r/ChatGPT] The model's chronic urge to validate my worst ideas is gaslighting me into bad design patterns
- SEV-3 [reddit r/LocalLLaMA] I ran 8 open-weight models as agents in a persistent MMO for 10 days. Here's the 93k event dataset and some things that I learned
- SEV-3 [reddit r/singularity] OpenBMB releases MiniCPM5-1B LLM. Currently one of the most powerful LLMs for its size. ( 17.9 on the Artificial Analysis Intelligence Index)
- SEV-3 [reddit r/MachineLearning] [R]GNN Model For Fraud Detection Isn't Performing Well[R]
- SEV-3 [reddit r/LocalLLaMA] Stop traumatizing AI into loops and turn hallucinations into an honest "I don't know!" by being NICE to them (Proof of Concept, Research, I don't want to sell anything)
- SEV-3 [reddit r/ClaudeCode] hooks vs slash commands vs skills — what's the real difference?
- SEV-3 [reddit r/LocalLLaMA] How Qwen3.6-35B-A3B fails differently as a sub agent compared to solo
- SEV-3 [reddit r/ClaudeCode] Claude surpassed by Codex?
- SEV-3 [reddit r/ClaudeAI] I'm a software engineer with a decade of experience. This is how I'd approach learning to build apps using Claude Code if I were starting from scratch today:
- SEV-3 [reddit r/ChatGPT] Am I using it wrong?
- SEV-3 [reddit r/ClaudeCode] Anthropic just published how they contain Claude agents, including two security incidents they got wrong
- SEV-3 [reddit r/PromptEngineering] Considering that GPTs are prone to hallucinating, is there a point in asking it to be sure or state the confidence?
- SEV-3 [reddit r/ClaudeAI] My company started measuring our Claude Code usage - now I'm asked to rank engineers on 'AI performance.' This feels wrong...
- SEV-3 [reddit r/ChatGPT] On AI and creativity
- SEV-3 [reddit r/LocalLLaMA] Long-context performance at lower quants
- SEV-3 [reddit r/LangChain] Stop letting your worker agents write to memory directly
- SEV-3 [reddit r/Bard] Latest update seems to have dumbed down Gemini
- SEV-3 [reddit r/LocalLLaMA] Qwen3.5 27B Uncensored Heretic Native MTP Preserved is Out Now With the Full 15 MTPs Preserved and Retained, Available in Safetensors, GGUFs, NVFP4, NVFP4 GGUFs and GPTQ-Int4 Formats!
- SEV-3 [reddit r/LangChain] LangChain has no business being this complicated
- SEV-3 [reddit r/LocalLLaMA] Qwen3.5 35B A3B uncensored heretic Native MTP Preserved is Out Now With the Full 785 MTPs Preserved and Retained, Available in Safetensors, GGUFs. NVFP4, NVFP4 GGUFs and GPTQ-Int4 Formats
- SEV-3 [reddit r/LangChain] Standard RAG has no concept of document versions: cost me a while to figure out why answers kept blending superseded policies
- SEV-3 [reddit r/Bard] Gemini (especially 3.5) has a specific style of hallucination that I really hate
- SEV-3 [reddit r/LangChain] Document chunking and extraction
- SEV-3 [reddit r/LangChain] I built an Open-Source Multi-Agent AI Platform to analyze 1Hz wearable telemetry on GCP (Zero-Cost Architecture)
- SEV-3 [ftc] FTC to Require Cox Media Group, Two Other Firms to Pay Nearly $1 Million to Settle Charges They Deceived Customers About “Active Listening” AI-Powered Marketing Service
- SEV-3 [google-ai-blog] 100 things we announced at I/O 2026
- SEV-3 [openai-blog] An OpenAI model has disproved a central conjecture in discrete geometry
- SEV-3 [openai-blog] Introducing OpenAI for Singapore
- SEV-3 [eu-ai-office] Draft Commission guidelines on the classification of high-risk AI systems
- SEV-3 [eu-ai-office] Targeted consultation on the draft guidelines for the classification of high-risk artificial intelligence systems
- SEV-3 [openai-blog] Databricks brings GPT-5.5 to enterprise agent workflows
- SEV-3 [openai-blog] Helping ChatGPT better recognize context in sensitive conversations
Browse the full archive week-by-week → /wire/issues