Methodology

Status: LOCKED. Date locked: 2026-04-24.

This document defines how every intervention page is produced. It is written before any specific intervention is assessed so that future-me cannot move goalposts to fit a preferred conclusion. Changes are logged as commits to this repository with explicit "methodology revision" reasoning, and force re-review of every existing page.

1. Scope

This project synthesizes the evidence for healthspan and lifespan interventions in:

Invertebrate models (C. elegans, D. melanogaster, S. cerevisiae)
Mammalian models (mouse, rat, dog, NHP)
Humans (where any data exists)

Endpoints we evaluate: median lifespan, maximum lifespan, healthspan markers (frailty, grip strength, gait, cognition), validated biomarkers of aging (epigenetic clocks, GlycanAge, etc.), all-cause mortality where available.

Endpoints we do not evaluate: subjective wellness, single-domain outcomes (e.g., "improves sleep") unless tied to an aging mechanism, marketing claims.

We do not issue supplement or treatment recommendations. The output is a calibration aid for readers who already plan to make their own decisions.

2. Evidence tiers

Every claim is anchored to one of these tiers. Higher tiers dominate lower tiers in conflict.

Tier	Description
T1	Human RCT, pre-registered, n ≥ 100, hard endpoint (mortality, validated aging biomarker), independently replicated
T2	Human RCT, pre-registered, single trial OR surrogate endpoint OR n < 100; OR large prospective cohort with strong confounder control
T3	ITP-replicated mouse lifespan result (any sex), or RP2-replicated, or ≥ 2 independent labs with consistent direction in mice
T4	Single-lab mouse lifespan extension, or NHP biomarker improvement, or healthspan-only mouse data
T5	Invertebrate lifespan extension (C. elegans, fly, yeast); mechanistic plausibility only
T0	Anecdote, n-of-1, uncontrolled human observation, biohacker self-report, in vitro cell line only

Cross-species translation discount: results in tier T5 do not transfer to mammals without independent confirmation. T4 → human translation is discounted by 1 tier when applied to human verdict.

3. Verdict bands

Every intervention receives one of:

Strong — T1 or multiple T2 evidence in humans, mechanism understood, effect direction agreed across labs. Examples expected: exercise.
Probable — T3 evidence solid AND at least suggestive human data (T2 surrogate endpoint or large cohort). Examples expected: rapamycin (mice strong, human limited).
Suggestive — T3 or T4 evidence with replication; human data absent or null. Examples expected: spermidine, taurine (pending replication).
Mixed — Tier-appropriate evidence exists but replication has failed or sex/strain dependence is severe. Examples expected: metformin in non-diabetics.
Mostly hype — Popular intervention with only T0/T5 evidence, or T4 evidence that has failed replication at higher tiers. Examples expected: resveratrol.
Insufficient evidence — Too little data of any tier to form a verdict.

A verdict band must be defensible against the calibration anchors. If a new intervention I'm rating "Probable" has weaker evidence than rapamycin (the anchor for Probable), I have miscalibrated.

4. Required fields per intervention page

Every page must populate ALL of the following. Missing data is recorded as "no data found" — never silently omitted.

TL;DR verdict (one sentence + verdict band)
What it is (chemical class, dose ranges, route)
Proposed mechanism with confidence level (established / plausible / hypothetical)
Evidence ladder — separate subsections for invertebrate, mouse, NHP, human; each with study count, effect size range, replication status
Sex, strain, dose dependence — required for mouse data; if not specified in the literature, that is itself a flag
Confounds — control diet adequacy, baseline mortality of the strain (short-lived strains inflate apparent gains), publication bias signal
Conflict of interest scan — industry funding, author equity, supplement industry ties
Human translation — what RCTs exist, what they actually measured, what they actually showed
Calibrated verdict — band + 2-3 sentence rationale + comparison to nearest calibration anchor
Confidence interval on verdict — "could plausibly move to X if Y replicates"
Open questions — things that would change the verdict if resolved
Last reviewed — date stamp
Sources — every citation with link if available

5. Conflict-of-interest discounts

Industry-funded study without pre-registration → discount 1 evidence tier
Author holds equity in the intervention's commercializer → discount 1 tier
Pre-registered + industry-funded → no discount (pre-registration neutralizes)
Studies funded by NIA / Wellcome / equivalent independent → no discount
ITP results never discounted (its design is the gold standard for this purpose)

6. Replication standards

In mice: "replicated" means ≥ 2 independent labs OR ITP positive cohort. A single-lab result, no matter how striking, is T4 not T3.
In humans: "replicated" means ≥ 2 independent RCTs with consistent direction on the same primary endpoint. Meta-analyses of underpowered trials do not count as replication.
Negative replications: a single high-quality negative replication does not erase a positive result, but two negative replications at the next-highest tier downgrade the verdict by one band.

7. Verdict change protocol

Verdicts can move. The thresholds:

Upgrade (one band up): requires new evidence at a higher tier than the current anchor evidence, OR a previously-flagged open question resolved positively.
Downgrade (one band down): requires (a) two negative replications at the next-highest evidence tier, OR (b) discovery of a methodological flaw invalidating the anchor evidence, OR (c) failure of a pre-registered confirmatory trial.
Any verdict change is logged in the commit history with the triggering evidence and pre-existing threshold quoted in the commit message.

This is the anti-drift mechanism. If I find myself wanting to upgrade NMN because a new podcast made a compelling case, the methodology says no without new evidence at the right tier.

8. When the methodology is silent

Some questions cannot be resolved from these rules alone (e.g., "is pre-print evidence admissible?", "how do we handle a retracted paper that's been re-published?"). Decision protocol:

First pass: triangulate from the published positions of the calibration authorities — Matt Kaeberlein, the ITP team, Cochrane reviewers if relevant. What would they do?
Second pass: open a GitHub issue on this repository for broader review.
Third pass: leave the question open; flag it explicitly on the affected intervention page.

Never silently invent a rule. If a new rule is needed, it goes through the methodology-revision process (section header).

9. What this methodology deliberately excludes

No precision-medicine claims. We rate interventions at the population level. Individual variation is real but out of scope.
No combination therapies in solo pages. Combinations get their own pages in interactions/.
No legal/regulatory analysis. "Available OTC in the US" is not evidence relevant to a verdict.
No reasoning from mechanism alone. "It hits mTOR so it should work" never elevates a verdict above Suggestive.
No pricing or accessibility commentary. The verdict is about the evidence; what people do with it is their choice.

10. Methodology revision log

(none yet — methodology locked 2026-04-24)