A first-party, voluntary standard for human-AI strategic reasoning. Kahneman’s debiasing tradition and Klein’s Recognition-Primed Decision tradition, run in one pipeline and arbitrated into a single artifact. Any product can self-assess against the rubric below.
The EU AI Act (Article 14 · human oversight; Article 15 · accuracy and record-keeping) and Basel III ICAAP both require documented reasoning on consequential decisions. R²F is one operational answer to "what does good look like."
Both Nobel traditions, not one
Kahneman (2002) and Klein converged in their 2009 paper "Conditions for Intuitive Expertise." The rigor tradition says expert intuition is biased; the recognition tradition says it is load-bearing. Real strategic reasoning needs both — arbitrated, not picked.
Usage beats certification
No third-party body certifies strategy products today. The path to a stable category is consistent usage of a shared rubric. R²F is published so others can self-assess and opt in to the registry.
The six tenets
Three rigor tenets on Kahneman’s side, three recognition tenets on Klein’s. Arbitrated by an explicit meta-judge stage — the glue that makes R²F different from running two pipelines in parallel.
Every memo is scanned against a published, stable taxonomy of cognitive biases.
An R²F-aligned system publishes its bias taxonomy with stable IDs. Scans are reproducible — the same memo produces the same flags on re-run with the same model version.
Decision Intel pipeline stage: Bias Detective
Kahneman · rigorTENET 2
Noise audit
The system measures internal inconsistency, not only bias.
Kahneman (Noise, 2021) separates noise from bias. An R²F-aligned system produces a noise score for every memo — within-memo contradictions, tone variance, and reasoning-chain divergence.
Decision Intel pipeline stage: Noise Judge
Kahneman · rigorTENET 3
Base-rate pull
Every claim is anchored to a reference class.
The system surfaces the base-rate for the category of decision (e.g. cross-border acquisitions, market entry, enterprise software pivots) so the memo is read against prior precedent, not in isolation.
Decision Intel pipeline stage: Ensemble Sampling
Klein · recognitionTENET 4
Pattern match to playbooks
Expert intuition is surfaced, not overruled.
Klein (Sources of Power, 1998) — experts recognise patterns. An R²F-aligned system produces a list of the historical playbooks the memo most resembles, with confidence and analogues.
Strategic failures are disproportionately caused by un-asked questions. An R²F-aligned system flags the high-value questions the memo sidestepped and the ones the CEO or board will most likely raise.
Every memo is told from a future in which it failed.
Gary Klein’s pre-mortem technique: assume the decision went wrong, then reason backward to why. An R²F-aligned system produces at least one pre-mortem per memo, anchored in the specific bias signature flagged.
Decision Intel pipeline stage: Pre-mortem
Self-assessment rubric
Ten yes/no questions. Score your own product, not ours. A score of 4+ earns Bronze; 7+ earns Silver; 10 earns Gold. This is a self-assessment — no external audit body certifies the result, and the standard is published so it can be disputed in public.
1Does the product publish its bias taxonomy with stable IDs?
2Are bias flags reproducible across runs on the same memo?
3Is a separate noise score produced for each memo?
4Is every strategic claim anchored to an explicit reference class?
5Does the system surface pattern analogues from a published corpus?
6Are "forgotten questions" flagged as a first-class output?
7Is a pre-mortem produced for every memo, not just on request?
8Is the pipeline’s reasoning arbitrated by an explicit meta-judge stage?
9Does every output trace back to a hashed, tamper-evident record?
10Does the system treat expert intuition as amplified input, not noise to suppress?
Bronze · R²F-aligned
4 of 10 tenets in place. Minimum bar to claim R²F alignment — bias + noise + one recognition signal.
Silver · R²F-integrated
7 of 10 tenets in place. Both rigor and recognition pipelines are running, including a named meta-judge.
Gold · R²F-arbitrated
All 10 tenets in place. Full Kahneman × Klein synthesis with a signed, reproducible audit record.
R²F DETECTOR ATLAS · 10 OF 10 SHIPPED
Every detector. Every paper. Every line of code.
Each entry below is a procurement-grade signal anchored in a specific academic paper. The 10-paper sprint completed 2026-05-07 with the Calibrated Rejection lock; click any detector to inspect its mechanism, implementation file, and live product surfaces.
Click any detector above to see its mechanism, citation, implementation file, live product surfaces, and a per-detector mini-visualisation of the signal it produces.
Calibration baseline · investor-diligence answer
Brier 0.258 ± 0.012 across 143 historical corporate decisions.
Procurement-stage diligence asks show me your outcome calibration. The published R²F methodology is run retrospectively over 143 historical corporate decisions where the outcome is known — Brier-fair (evidence dimension neutralised, no peek at ground truth). The proper-scoring rule does the rest. The 95% CI comes from a 10,000-iteration deterministic bootstrap with a pinned seed; the reproducibility recipe sits below.
Methodology version progression
v2.0.0DEPRECATED
Legacy
Pre-validity-weighted DQI. Surfaced on audits run before 2026-04-30.
v2.0.0-seedSEED
Platform seed baseline
The number above. Computed retrospectively over the 143-case library; evidence dimension neutralised so predictions don't peek at ground truth.
v2.1.0LIVE
Live audits
Validity-aware weight shift per Kahneman & Klein 2009 first condition. Active on every new audit since 2026-04-30.
vper-orgFUTURE
Customer outcomes
When a customer org accumulates closed outcomes via Outcome Gate enforcement, per-org Brier replaces the seed baseline on every DPR they generate.
Where the seed baseline lands · Tetlock-anchored scale
0.130
Tetlock superforecasters
0.230
CIA analysts (calibrated)
0.258
Decision Intel · platform seed
0.250
Coin-flip benchmark
0.350
Motivated amateur
Seed, not forecast
These cases were audited retrospectively. The platform did NOT predict them ahead of time. The Brier number reads: “if the methodology had been applied at decision time using only pre-decision signal, this is how it would have calibrated against ground truth.”
Per-org Brier supersedes
When a customer org accumulates closed outcomes via Outcome Gate enforcement, per-org calibration replaces the seed baseline on every DPR they generate. Until then, the seed is the contractual answer.
The function reads `ALL_CASES` from the case-study library, runs `computeBrierFairPredictedDqi` (the no-evidence-peeking variant of the DQI formula) over each case, and bootstraps the mean with a seeded mulberry32 PRNG. Same seed → same number across machines and dates.
Canonical citation
Kahneman, D. & Klein, G. (2009). Conditions for Intuitive Expertise: A failure to disagree. American Psychologist, 64(6), 515–526. The framing paper where the two traditions converged. R²F is the operationalisation.