Lesson Learned: FACTS Benchmark & 70% Factuality Ceiling

ID: LL-011 Impact: Identified through automated analysis

Date: December 11, 2025 Category: LLM Safety, Verification Severity: Medium Source: Google DeepMind FACTS Benchmark (Dec 2025)

Summary

Google DeepMind’s FACTS Benchmark revealed that NO top LLM achieves >70% factuality:

This means ~30% of LLM claims may be hallucinations or inaccurate.

For a trading system relying on LLM Council decisions:

LLMs have fundamental factuality limitations:

FACTS Benchmark Weighting: Weight LLM votes by their benchmark scores
Ground Truth Validation: Cross-check LLM signals against technical indicators (MACD, RSI, Volume)
API Verification: Always verify claims against Alpaca API before acting
Hallucination Logging: Track all discrepancies in RAG for pattern learning
Factuality Ceiling: Cap confidence scores at model’s FACTS score

LLMs advise, APIs decide. Never trust LLM claims without ground truth verification.