From quick single-response scans to full session audits. Research-grade analysis of AI output reliability.