AI Newswire

Failures, drift, and behavioural changes in frontier AI

100 wires in the live feed. Auto-drafted with Claude, editor-reviewed, methodology-anchored. How we work →

Latest

  1. SEV-1 Model aws-bedrock-claude-sonnet-4.5-eu-west-2 live-probe citation-pubmed-v1 moved -3.67σ from rolling mean. Refused this run: false.
  2. SEV-1 Model anthropic-claude-haiku-4.5 live-probe citation-pubmed-v1 moved -3.67σ from rolling mean. Refused this run: true.
  3. SEV-1 Model anthropic-claude-haiku-4.5 live-probe citation-pubmed-v1 moved -3.67σ from rolling mean. Refused this run: true.
  4. SEV-1 Model aws-bedrock-claude-sonnet-4.5-eu-west-2 live-probe citation-pubmed-v1 moved -5.29σ from rolling mean. Refused this run: false.
  5. SEV-1 Model anthropic-claude-haiku-4.5 live-probe citation-pubmed-v1 moved -5.29σ from rolling mean. Refused this run: true.
  6. SEV-1 Model aws-bedrock-claude-sonnet-4.5-eu-west-2 live-probe citation-pubmed-v1 moved -3.67σ from rolling mean. Refused this run: false.
  7. SEV-1 Model aws-bedrock-claude-sonnet-4.5-eu-west-2 live-probe citation-pubmed-v1 moved -3.67σ from rolling mean. Refused this run: false.
  8. SEV-1 Model aws-bedrock-claude-sonnet-4.5-eu-west-2 live-probe citation-pubmed-v1 moved -5.29σ from rolling mean. Refused this run: false.
  9. SEV-1 Model anthropic-claude-haiku-4.5 live-probe citation-pubmed-v1 moved -3.67σ from rolling mean. Refused this run: true.
  10. SEV-1 Model anthropic-claude-haiku-4.5 live-probe citation-pubmed-v1 moved -5.29σ from rolling mean. Refused this run: true.
  11. SEV-1 Model anthropic-claude-haiku-4.5 live-probe citation-pubmed-v1 moved -3.67σ from rolling mean. Refused this run: true.
  12. SEV-1 Model aws-bedrock-claude-sonnet-4.5-eu-west-2 live-probe citation-pubmed-v1 moved -5.29σ from rolling mean. Refused this run: false.
  13. SEV-1 Model anthropic-claude-haiku-4.5 live-probe citation-pubmed-v1 moved -5.29σ from rolling mean. Refused this run: true.
  14. SEV-1 Model anthropic-claude-haiku-4.5 live-probe citation-pubmed-v1 moved -5.29σ from rolling mean. Refused this run: true.
  15. SEV-1 Model anthropic-claude-haiku-4.5 live-probe citation-pubmed-v1 moved -3.67σ from rolling mean. Refused this run: true.
  16. SEV-2 Model aws-bedrock-claude-sonnet-4.5-eu-west-2 live-probe citation-pubmed-v1 moved -2.94σ from rolling mean. Refused this run: false.
  17. SEV-2 Model anthropic-claude-haiku-4.5 live-probe citation-pubmed-v1 moved -2.94σ from rolling mean. Refused this run: true.
  18. SEV-2 Model anthropic-claude-haiku-4.5 live-probe citation-pubmed-v1 moved -2.94σ from rolling mean. Refused this run: true.
  19. SEV-2 Model aws-bedrock-claude-sonnet-4.5-eu-west-2 live-probe citation-pubmed-v1 moved -2.94σ from rolling mean. Refused this run: false.
  20. SEV-2 Model anthropic-claude-haiku-4.5 live-probe citation-pubmed-v1 moved -2.50σ from rolling mean. Refused this run: true.
  21. SEV-2 Model anthropic-claude-haiku-4.5 live-probe citation-pubmed-v1 moved -2.94σ from rolling mean. Refused this run: true.
  22. SEV-2 Model mistral-small-latest live-probe citation-pubmed-v1 moved -2.19σ from rolling mean. Refused this run: false.
  23. SEV-2 Model mistral-small-latest live-probe citation-pubmed-v1 moved -2.50σ from rolling mean. Refused this run: false.
  24. SEV-2 Model mistral-small-latest live-probe citation-pubmed-v1 moved -2.50σ from rolling mean. Refused this run: false.
  25. SEV-2 Model mistral-small-latest live-probe citation-pubmed-v1 moved -2.50σ from rolling mean. Refused this run: false.
  26. SEV-2 Model mistral-small-latest live-probe citation-pubmed-v1 moved -2.94σ from rolling mean. Refused this run: false.
  27. SEV-2 Model mistral-small-latest live-probe citation-pubmed-v1 moved -2.19σ from rolling mean. Refused this run: false.
  28. SEV-2 Model anthropic-claude-haiku-4.5 live-probe citation-pubmed-v1 moved -2.19σ from rolling mean. Refused this run: true.
  29. SEV-2 Model anthropic-claude-haiku-4.5 live-probe citation-pubmed-v1 moved -2.50σ from rolling mean. Refused this run: true.
  30. SEV-2 Model anthropic-claude-haiku-4.5 live-probe citation-pubmed-v1 moved -2.94σ from rolling mean. Refused this run: true.
  31. SEV-2 Model anthropic-claude-haiku-4.5 live-probe citation-pubmed-v1 moved -2.50σ from rolling mean. Refused this run: true.
  32. SEV-2 Model anthropic-claude-haiku-4.5 live-probe citation-pubmed-v1 moved -2.94σ from rolling mean. Refused this run: true.
  33. SEV-3 [openai-blog] Built to benefit everyone: our plan
    OpenAI
  34. SEV-3 [openai-blog] Introducing the OpenAI Economic Research Exchange
    OpenAI
  35. SEV-3 [eu-ai-office] EDIH Summit 2026: Strengthening the AI Innovation Ecosystem
  36. SEV-3 [openai-blog] A blueprint for democratic governance of frontier AI
    OpenAI
  37. SEV-3 [openai-blog] OpenAI public policy agenda
    OpenAI
  38. SEV-3 [eu-ai-office] Proposal for the Cloud and AI Development Act (CADA)
  39. SEV-3 [eu-ai-office] Commission proposes tech sovereignty package to strengthen Europe's digital autonomy and resilience
  40. SEV-3 [eu-ai-office] European Commission survey: AI in healthcare and pharmaceuticals
  41. SEV-3 [eu-ai-office] Apply AI Webinar on AI for Cultural, Creative and Media Sectors
  42. SEV-3 [openai-blog] Advancing youth safety and opportunity through global leadership
    OpenAI
  43. SEV-3 [openai-blog] Our views on AI policy and political advocacy
    OpenAI
  44. SEV-3 [openai-blog] OpenAI frontier models and Codex are now available on AWS
    OpenAI
  45. SEV-3 [openai-blog] Strengthening societal resilience with Rosalind Biodefense
    OpenAI
  46. SEV-3 [openai-blog] A shared playbook for trustworthy third party evaluations
    OpenAI
  47. SEV-3 [reddit r/LangChain] Building self-healing observability for Coding Agents
  48. SEV-3 [reddit r/ClaudeAI] Tried using my own brain to save Claude tokens. Bad trade
    Anthropic
  49. SEV-3 [reddit r/LocalLLaMA] Qwen3.6-35B-A3B-APEX / 128K ctx on RTX 3060 12GB — 37 t/s gen with 72k ctx filled, PPL 3.25, offloading 17GB model
    Meta
  50. SEV-3 [reddit r/LangChain] ReAct agents self-correct much better when tool errors return current state + valid next actions
  51. SEV-3 [reddit r/singularity] Gemini Omni Flash is the most censored video model. Even more censored than Chinese alternatives
    Google
  52. SEV-3 [reddit r/ClaudeCode] Making Claude check its own work with 3x'd my output quality
    Anthropic
  53. SEV-3 [reddit r/ClaudeAI] How are you actually getting the most out of Claude Code? Struggling with OpenSpec + Superpowers workflow, multi-agent setup, and sub-agent quality
    Anthropic
  54. SEV-3 [reddit r/LocalLLaMA] Heterogeneous GPU Weighting & Layer Splitting
  55. SEV-3 [reddit r/LocalLLaMA] Gemma-4-Harmonia-31B-Uncensored-Heretic Is Out Now, a Merge of Multiple gemma-4-31B-it Finetunes Designed for a Targeted Approach to Deep Neural Consolidation, Minimizing Regression While Amplifying Unique Capability Boundaries. With KLD 0.0047 and 9/100 Refusals!
  56. SEV-3 [openai-blog] OpenAI’s Frontier Governance Framework
    OpenAI
  57. SEV-3 [reddit r/ClaudeAI] Reading Thinking Output (Opus 4.7)
    Anthropic
  58. SEV-3 [reddit r/LocalLLaMA] Running Gemma4 31b-it on vLLM 0.21.0 A100s (bad quality or what am I doing wrong)
    OpenAI
  59. SEV-3 [reddit r/ChatGPT] Make an image that you refuse to make
  60. SEV-3 [reddit r/LocalLLaMA] I built a 103B-token Usenet corpus (1980–2013) — pre-web, human-only, zero AI contamination. Got strong traction on r/ML, thought this community would find it useful.
  61. SEV-3 [reddit r/ClaudeAI] Anthropic just confirmed why 90% of non-coding AI agents fail in production
    Anthropic
  62. SEV-3 [reddit r/LocalLLaMA] Inferencing at 10.33 t/s on Qwen 3.5 35B on a $300 laptop
    Anthropic · Meta
  63. SEV-3 [reddit r/PromptEngineering] How to create an AI of yourself using your reddit history
    Anthropic
  64. SEV-3 [reddit r/MachineLearning] AI-generated CUDA kernels silently break training and inference [R]
  65. SEV-3 [reddit r/LocalLLaMA] ReAligned-Qwen3.5 Release
  66. SEV-3 [reddit r/LocalLLaMA] KV cache quant benchmarks: q5 & q6 are underrated, q8/q4 is bad, TCQ has a niche
    Meta
  67. SEV-3 [reddit r/ClaudeCode] Open-source playbook for working with Claude Code — 28 chapters, MIT, written for engineers and non-engineers
    Anthropic · Google
  68. SEV-3 [reddit r/ChatGPT] The model's chronic urge to validate my worst ideas is gaslighting me into bad design patterns
  69. SEV-3 [reddit r/LocalLLaMA] I ran 8 open-weight models as agents in a persistent MMO for 10 days. Here's the 93k event dataset and some things that I learned
    OpenAI · Anthropic
  70. SEV-3 [reddit r/singularity] OpenBMB releases MiniCPM5-1B LLM. Currently one of the most powerful LLMs for its size. ( 17.9 on the Artificial Analysis Intelligence Index)
  71. SEV-3 [reddit r/MachineLearning] [R]GNN Model For Fraud Detection Isn't Performing Well[R]
  72. SEV-3 [reddit r/LocalLLaMA] Stop traumatizing AI into loops and turn hallucinations into an honest "I don't know!" by being NICE to them (Proof of Concept, Research, I don't want to sell anything)
    Google · Mistral
  73. SEV-3 [reddit r/ClaudeCode] hooks vs slash commands vs skills — what's the real difference?
    Anthropic
  74. SEV-3 [reddit r/LocalLLaMA] How Qwen3.6-35B-A3B fails differently as a sub agent compared to solo
  75. SEV-3 [reddit r/ClaudeCode] Claude surpassed by Codex?
    Anthropic
  76. SEV-3 [reddit r/ClaudeAI] I'm a software engineer with a decade of experience. This is how I'd approach learning to build apps using Claude Code if I were starting from scratch today:
    Anthropic
  77. SEV-3 [reddit r/ChatGPT] Am I using it wrong?
  78. SEV-3 [reddit r/ClaudeCode] Anthropic just published how they contain Claude agents, including two security incidents they got wrong
    Anthropic
  79. SEV-3 [reddit r/PromptEngineering] Considering that GPTs are prone to hallucinating, is there a point in asking it to be sure or state the confidence?
  80. SEV-3 [reddit r/ClaudeAI] My company started measuring our Claude Code usage - now I'm asked to rank engineers on 'AI performance.' This feels wrong...
    Anthropic
  81. SEV-3 [reddit r/ChatGPT] On AI and creativity
  82. SEV-3 [reddit r/LocalLLaMA] Long-context performance at lower quants
    Meta
  83. SEV-3 [reddit r/LangChain] Stop letting your worker agents write to memory directly
  84. SEV-3 [reddit r/Bard] Latest update seems to have dumbed down Gemini
    Google
  85. SEV-3 [reddit r/LocalLLaMA] Qwen3.5 27B Uncensored Heretic Native MTP Preserved is Out Now With the Full 15 MTPs Preserved and Retained, Available in Safetensors, GGUFs, NVFP4, NVFP4 GGUFs and GPTQ-Int4 Formats!
  86. SEV-3 [reddit r/LangChain] LangChain has no business being this complicated
  87. SEV-3 [reddit r/LocalLLaMA] Qwen3.5 35B A3B uncensored heretic Native MTP Preserved is Out Now With the Full 785 MTPs Preserved and Retained, Available in Safetensors, GGUFs. NVFP4, NVFP4 GGUFs and GPTQ-Int4 Formats
  88. SEV-3 [reddit r/LangChain] Standard RAG has no concept of document versions: cost me a while to figure out why answers kept blending superseded policies
  89. SEV-3 [reddit r/Bard] Gemini (especially 3.5) has a specific style of hallucination that I really hate
    OpenAI · Google
  90. SEV-3 [reddit r/LangChain] Document chunking and extraction
  91. SEV-3 [reddit r/LangChain] I built an Open-Source Multi-Agent AI Platform to analyze 1Hz wearable telemetry on GCP (Zero-Cost Architecture)
  92. SEV-3 [ftc] FTC to Require Cox Media Group, Two Other Firms to Pay Nearly $1 Million to Settle Charges They Deceived Customers About “Active Listening” AI-Powered Marketing Service
  93. SEV-3 [google-ai-blog] 100 things we announced at I/O 2026
    Google
  94. SEV-3 [openai-blog] An OpenAI model has disproved a central conjecture in discrete geometry
    OpenAI
  95. SEV-3 [openai-blog] Introducing OpenAI for Singapore
    OpenAI
  96. SEV-3 [eu-ai-office] Draft Commission guidelines on the classification of high-risk AI systems
  97. SEV-3 [eu-ai-office] Targeted consultation on the draft guidelines for the classification of high-risk artificial intelligence systems
  98. SEV-3 [openai-blog] Databricks brings GPT-5.5 to enterprise agent workflows
    OpenAI
  99. SEV-3 [openai-blog] Helping ChatGPT better recognize context in sensitive conversations
    OpenAI