Karen Serfaty — independent researcher Snapshot date: 2026-04-25 · Genre: narrative synthesis (perspective) reported per SANRA standards [1] Open database & methodology: see Data Availability statement (§7).
Background. The aging-intervention literature is over-published and under-synthesized. Public belief about longevity interventions diverges from the highest-tier evidence in identifiable, repeated ways. Existing synthesis attempts are dispersed, slow, or commercially conflicted.
Methods. A pre-committed methodology — evidence tiers (T0–T5), six verdict bands (Strong, Probable, Suggestive, Mixed, Mostly hype, Insufficient evidence), explicit replication standards, conflict-of-interest discounts, and five locked calibration anchors (exercise, caloric restriction, rapamycin, NMN, resveratrol) — was written and frozen on 2026-04-24, before any specific intervention was assessed. Thirty-eight interventions and four pairwise interactions were then synthesized against the locked rules, with primary literature retrieved via PubMed and the NIA Interventions Testing Program (ITP) database. The synthesis was AI-assisted (Anthropic Claude); the human author verified citations and audited representative pages. Reporting follows SANRA [1]; protocol and locked methodology are publicly archived.
Findings. Across 38 intervention pages, 5 verdicts landed in Strong, 11 in Probable, 14 in Suggestive, 5 in Mixed, 10 in Mostly hype, and 2 in Insufficient evidence (Figure 1). Eight specific divergences from popular framing were identified (Figure 2): metformin's longevity case is weaker than its reputation (ITP-null when tested alone; MET-PREVENT 2025 shows exercise antagonism); resveratrol's SIRT1 mechanism does not survive 2025 meta-analysis; taurine's foundational age-decline premise was contested in 2025; spermidine's largest RCT (SmartAge, n=100, 12 months) was null on primary endpoint; calorie-matched time-restricted eating produces equivalent outcomes to caloric restriction (Liu 2022 NEJM); the Zone-2-is-optimal claim is weaker than the broader VO2max–mortality claim; single-group dominance is a recurring failure mode across hyperbaric oxygen and GlyNAC. Conversely, acarbose, sauna bathing, and resistance training are underweighted relative to their evidence.
Conclusions. A pre-committed methodology applied to public sources reveals a longevity ranking that diverges from popular discourse in ways that are large and consistent. The methodology, calibration anchors, and full database are openly published; the autonomous-update infrastructure that maintains the database continues after this snapshot.
Keywords: aging, longevity, evidence synthesis, narrative review, calibrated assessment, rapamycin, metformin, exercise, caloric restriction, AI-assisted synthesis.
The longevity field has a peculiar pathology. PubMed indexes tens of thousands of papers tagged with "aging" or "lifespan." The Interventions Testing Program — the gold standard for mouse-lifespan replication [2] — has tested dozens of compounds. Human RCTs on cardiovascular endpoints, glycemic control, frailty, and cognition number in the thousands. The volume of evidence is enormous.
The volume of synthesis is small, fragmented, and dominated by voices with commercial conflicts. The most-followed sources of "what does the evidence say about X" are podcasts running 90–180 minutes per intervention, books written by founders of supplement companies, and influencers whose business model depends on the salience of specific compounds. Independent academic synthesis exists — Matt Kaeberlein's commentaries, the ITP's own publications, occasional Cochrane reviews — but is dispersed, slow, and inaccessible to the people making purchasing and lifestyle decisions today.
The result is a calibration gap. The general public, and a meaningful fraction of practicing clinicians, hold beliefs about aging interventions that disagree with the evidence base in identifiable ways. Metformin "extends lifespan" — except the ITP found it does not extend mouse lifespan when tested alone [3]. Resveratrol activates SIRT1 — except a 2025 GRADE-assessed meta-analysis found it does not significantly affect human SIRT1 levels [4]. Taurine declines with age and supplementation reverses the decline — except a 2025 Science follow-up found taurine does not reliably decline with age in the first place [5,6]. Time-restricted eating works through circadian mechanisms independent of calories — except Liu et al. 2022 showed it produces equivalent outcomes to matched caloric restriction [7].
These are not subtle errors. They are pillars of popular belief that do not survive contact with the highest-tier evidence available.
The gap is not a research problem. It is a synthesis problem. The evidence already exists; what is missing is calibrated, opinionated, accessible aggregation.
This paper is one attempt at that synthesis. It is opinionated in two senses. First, the methodology is opinionated: pre-committed evidence tiers, locked verdict bands, explicit replication standards, explicit conflict-of-interest discounts. Second, the conclusions are opinionated: 38 interventions classified, and the classification disagrees with popular discourse on enough interventions to be uncomfortable.
The thesis is short: the popular longevity ranking is wrong, the evidence to correct it already exists, and the correction is straightforward to describe if one commits to a methodology before reading the literature on any specific compound.
This is a narrative synthesis — not a systematic review and not a meta-analysis. It does not follow PRISMA [8]. The literature search was structured but not exhaustive in the PRISMA sense, and no quantitative pooling of effect sizes is performed. Reporting follows the Scale for the Assessment of Narrative Review Articles (SANRA) [1], which specifies six items: justification of importance, statement of aims, description of literature search, referencing standard, presentation of evidence levels, and presentation of relevant endpoint data. The corresponding sections of this paper are listed against each SANRA item in Appendix D.
Every methodological choice — evidence tiers, verdict thresholds,
replication standards, conflict-of-interest discounts, and the five
calibration-anchor verdicts — was written down and locked in a file (methodology.md)
on 2026-04-24, before any specific intervention was assessed. A second
locked file (CALIBRATION_ANCHORS.md)
committed to verdicts for the five anchor interventions on the same
date. Both files have a single creation date and no edits in the audit
log. This is the central anti-bias mechanism of the synthesis: a reader
cannot accuse the author of moving goalposts to fit a preferred
conclusion, because the goalposts are physical files with creation
metadata. The locked files are publicly archived (§7).
Every claim anchors to one of seven tiers:
Higher tiers dominate lower tiers in conflict. T5 results do not transfer to mammals without independent confirmation. T4 results are discounted by one tier when extrapolated to humans.
Six bands, with thresholds defined before any intervention was rated:
Five interventions were locked into specific bands before any other rating, against published positions of independent authorities (Kaeberlein commentary, ITP publications, Cochrane reviews):
Every other intervention's verdict was triangulated against these anchors and the comparison documented on each intervention's page in the open database.
Per ICMJE 2024 guidance [18], COI is treated separately from funding. In this synthesis:
Verdicts can move; the thresholds are pre-specified. Upgrades require new evidence at a higher tier or resolution of a previously-flagged open question. Downgrades require two negative replications at the next-highest tier, discovery of a methodological flaw invalidating anchor evidence, or failure of a pre-registered confirmatory trial. Every change is logged with the triggering evidence and methodology section invoked.
The synthesis is not a PRISMA-style systematic search. The search strategy was:
meta/sources.md
of the open database — Kaeberlein, the ITP team, Cochrane reviewers —
wherever the methodology was silent on a judgment call.The full per-intervention citation lists are in the open database
(§7) at interventions/<name>.md.
Per ICMJE 2024 [20] and the AMEE Guide on AI disclosure [21], this synthesis was produced using a Claude model (Anthropic) in a structured workflow over 2026-04-24 and 2026-04-25. The model wrote initial drafts of each intervention page under the locked methodology, performed structured literature searches, and applied the verdict-band rules. The human author (Karen Serfaty) directed the methodology design, verified citations, audited representative pages against the underlying literature, and approved the final synthesis. The author takes full responsibility for the contents of this paper. No AI system is listed as an author per ICMJE: AI cannot accept responsibility for accuracy, integrity, and originality, which is the bar for authorship [20].
The reason to disclose this prominently is that the methodology's value rests on its pre-commitment, and a reader needs to know the rules were genuinely locked before any intervention was assessed. The methodology file has a single creation date and no edits in the repository's commit history; that history is the audit trail.
Across 38 intervention pages and 4 interaction pages, primary verdicts (Figure 1) distributed as: Strong (5), Probable (11), Suggestive (14), Mixed (5), Mostly hype (10), Insufficient evidence (2). Stratified verdicts (e.g., "Probable in obese / Suggestive in lean") were classified by the strongest claim's band for the figure; full stratification is preserved in Appendix A.
Figure 1 — Distribution of verdict bands across the
synthesis. Primary verdict assigned to each of 38
intervention pages, classified by the strongest claim where the verdict
is population-stratified.
The shape is informative. Mostly hype (10) and Suggestive (14) together account for 63% of interventions — a reasonable map of a field where in-vitro biochemistry generates supplement marketing far ahead of replication. Strong (5) is concentrated in non-pharmacological interventions plus secondary-prevention statins. Probable (11) is dominated by drug classes with hard-endpoint trials in defined populations.
Tables 1–6 group all 38 interventions by primary verdict band, with a single-sentence rationale per intervention. Stratified verdicts (population- or species-specific) are noted. Full per-intervention citations are in the open database; the most decisive citation per intervention appears in §4.
Table 1 — Strong (in humans).
| Intervention | Rationale |
|---|---|
| Aerobic exercise | 20–40% all-cause mortality reduction at moderate doses; mechanism breadth across nearly every aging hallmark [9,10,11]. |
| Resistance training | 10–20% all-cause mortality + unique sarcopenia / falls protection [22,23]. |
| Sleep (~7 h) | U-shaped mortality curve at population scale; observational evidence at near-RCT magnitude [24,25]. |
| Statins (secondary prevention) | The best-evidenced cardiovascular drug class; T1 hard-endpoint evidence [26]. |
| Caloric restriction (in mice) | Decades of replication; the ceiling for mouse-only claims [12]. (Suggestive in humans on biomarkers only [13].) |
Table 2 — Probable.
| Intervention | Rationale |
|---|---|
| Rapamycin | ITP-positive across cohorts; PEARL/Mannick human surrogate evidence [14,15,16]. |
| Acarbose | ITP +22% males / +5% females; combination with rapamycin extends further [27,28]. |
| 17α-estradiol | ITP +12–19% in male mice only; human translation essentially zero [27,17]. |
| Canagliflozin | ITP-positive male mice; SGLT2-class T1 evidence in CV/CKD/HF populations [29,30]. |
| GLP-1 agonists (obese / CV-risk) | SELECT 20% MACE reduction + all-cause mortality reduction in obese non-diabetic adults with CVD [31]. |
| Statins (primary, 40–75) | USPSTF B-grade in elevated CV risk [32]. |
| Sauna / heat exposure | Kuopio cohort dose-response; ~50% CV mortality reduction at 4–7 sessions/week [33]. |
| HRT (women, ≤ age 60 / within 10 y of menopause) | WHI 20-year follow-up timing-stratified favorable [34]. |
| TRT (documented hypogonadism) | TRAVERSE cleared CV-safety [35]. |
| Creatine + RT (older adults) | Sarcopenia and cognitive benefit in older adults [36]. |
| Zone 2 / VO2max framework | Probable for the broader CRF–mortality claim [37]; Mixed for the Zone-2-is-optimal-protocol claim [38]. |
Table 3 — Suggestive.
| Intervention | Rationale |
|---|---|
| NMN / NR | NR failed at ITP [17]; NMN itself untested at ITP-grade; human RCTs measure NAD⁺ levels not outcomes [39]. |
| Senolytics (D+Q, fisetin) | Strong mouse healthspan; early human trials with mixed clock signals [40,41,42,43]. |
| Spermidine | SmartAge primary endpoint null (n=100, 12 months) [44]; observational signal positive but confounded. |
| GlyNAC | Single-group dominance (Sekhar lab); no independent replication [45,46]. |
| Sulforaphane | Best-evidenced natural Nrf2 activator [47]. |
| Plasma exchange / TPE | Conboy lab pilots; commercial layering high. |
| Taurine (leaning ↓) | Yadav 2023 striking but single-lab [5]; 2025 Science follow-up undermined age-decline premise [6]. |
| Omega-3 (general) | VITAL primary null [48]; REDUCE-IT positive at 4 g/day in elevated TG / on-statin CVD [49]. |
| Lithium (low-dose, observational + MCI) | Drinking-water observational signals on suicide/dementia [50]. |
| CoQ10 (HF / statin myalgia) | Q-SYMBIO supports HF use [51]. |
| EGCG / green tea (consumption) | Observational mortality benefit in East Asian cohorts [52]. |
| Berberine (cardiometabolic) | Modest glycemic/lipid effects; longevity claim weaker. |
| Cold exposure (narrow indications) | Mood, BAT, recovery; longevity claim weak. |
Table 4 — Mixed.
| Intervention | Rationale |
|---|---|
| Metformin | ITP-null when tested alone [3]; observational human data confounded; MET-PREVENT 2025 shows exercise blunting [53]. |
| Time-restricted eating | Liu 2022 NEJM null vs matched-calorie comparator [7]; benefit largely calorie-mediated. |
| Vitamin D | VITAL primary null [54]; ~13% cancer-mortality signal real [55]; hip-fracture signal in women [56]. |
| Hyperbaric oxygen | Single-group dominance + COI [57]; replication missing. |
| Statins (>75 healthy primary) | USPSTF: insufficient evidence; STAREE pending [58]. |
Table 5 — Mostly hype.
| Intervention | Rationale |
|---|---|
| Resveratrol | ITP-failed [3]; SIRT1 mechanism does not survive 2025 meta-analysis [4]; Sirtris/GSK program failed. |
| Curcumin | Bioavailability ceiling; PAINS critique of in vitro promiscuity [59]; modest evidence only for OA pain. |
| Quercetin (standalone) | Senolytic case lives in D+Q, not standalone. |
| Methylene blue | TauRx Alzheimer's program failed; biohacker framing unsupported. |
| Pterostilbene / PQQ / Astaxanthin | Same template as resveratrol. |
| EGCG (high-dose supplements) | Hepatotoxicity signal [60]. |
| Lithium (microdosing broader claims) | Supplement-form longevity claim unsupported beyond observational data. |
| CoQ10 (general longevity) | Heart failure data does not generalize to aging in healthy adults. |
| Berberine (longevity framing) | "Natural metformin" framing unsupported. |
| Cold exposure (longevity framing) | Wellness-industry framing outruns evidence. |
Table 6 — Insufficient evidence (preclinical-dominant).
| Intervention | Rationale |
|---|---|
| Yamanaka factors / partial reprogramming | Most preclinically exciting category [61,62]; first FDA-cleared cellular rejuvenation trial April 2026. |
| Klotho upregulation | Strong mouse evidence [63,64]; no human supplementation pathway established. |
Table 7 — Interactions.
| Combination | Verdict |
|---|---|
| Rapamycin + Acarbose | Probable in mice — among largest ITP combined effects (+34% males, +28% females) [28]. |
| Rapamycin + Metformin | Probable but driven by rapamycin; metformin contributes little additive benefit [65]. |
| Metformin × Exercise | Probable antagonism — metformin blunts mitochondrial and hypertrophy adaptations [53,66]. |
| GLP-1 + Resistance Training | Probable mitigation of GLP-1-induced muscle loss. |
Figure 2 visualizes the eight specific divergences between popular framing and the methodology's verdict. This is the section where the methodology earns its keep.
Figure 2 — Where popular discourse diverges from the
methodology's verdict. For each intervention, the popular
framing band (gray dot) and the methodology's verdict band (colored
square) are plotted. Red = methodology rates lower than popular framing;
green = methodology rates higher.
The popular framing: metformin is a generic, well-tolerated diabetes drug that also extends lifespan; the TAME trial will confirm what observational data already suggests.
The actual evidence: the Interventions Testing Program tested metformin alone in genetically heterogeneous mice (UM-HET3) and found no lifespan extension [3]. The Strong et al. 2016 ITP cohort that combined metformin with rapamycin showed lifespan benefit, but the benefit was driven by rapamycin, not metformin [65]. Other mouse studies (Martin-Montalvo 2013) reported positive lifespan effects in single inbred strains [67]; these did not survive the multi-site replication that the ITP requires.
The dominant human evidence cited for metformin's longevity case is Bannister et al. 2014 [68], an observational claim that metformin-treated diabetics outlive non-diabetic controls. This finding is heavily confounded by indication bias, healthy-adherer effects, and selection of stable monotherapy survivors. The methodology's COI/replication rules make it T2 evidence, heavily discounted.
The MET-PREVENT 2025 RCT introduced an additional concern: metformin blunts mitochondrial and hypertrophy adaptations to combined aerobic + resistance training in older adults [53], replicating Konopka 2019's earlier finding [66]. For someone who exercises seriously — and exercise is the Strong-band intervention in this synthesis — metformin is plausibly net-negative on the longevity goal. The intervention with the weaker evidence is undermining the intervention with the strongest evidence.
The TAME trial may yet read positive, but as of 2026 it remains incompletely funded with results expected late this decade. The methodology says: the verdict reflects current evidence, not anticipated evidence. Current evidence puts metformin-for-longevity at Mixed.
If metformin is over-attended-to relative to its evidence, acarbose is the inverse case. ITP cohorts show median lifespan extension of ~22% in males and ~5% in females [27]. The rapamycin + acarbose combination (Strong et al. 2022) produced one of the largest combined effects in ITP history: +34% males, +28% females [28] — larger than rapamycin alone.
Acarbose is FDA-approved for type 2 diabetes, generic, low-cost, and broadly tolerated. Its mechanism (blunting post-prandial glucose excursions via α-glucosidase inhibition) is mechanistically distinct from metformin's, and may not share metformin's exercise-blunting interaction. Yet acarbose is mentioned in popular longevity discourse a small fraction as often as metformin. This is the clearest case in the synthesis where the popular ranking diverges from the methodology's ranking based purely on the evidence available to both.
The popular framing: 16:8 time-restricted eating provides metabolic and aging benefits via circadian alignment, independent of caloric reduction.
The actual evidence: Liu et al. 2022 [7] ran a 12-month RCT in obese adults comparing 8-hour TRE plus caloric restriction versus caloric restriction alone. No significant difference between groups on weight loss or metabolic markers. The TREAT trial (Lowe 2020) had earlier found modest weight loss in 16:8 TRE versus 3-meals-per-day, but no significant cardiometabolic advantage and a concerning lean-mass loss signal [69].
When trials match calories, the eating-window manipulation produces small or null marginal benefits. There may be a real glycemic-control benefit specifically (some 2025 meta-analyses support this), but the framing of TRE as a distinct aging intervention beyond calorie reduction is not supported.
For older adults considering TRE for longevity, the sarcopenia-via-undereating risk in the lean-mass-loss signal is real and rarely mentioned in popular framing. Verdict: Mixed.
Resveratrol's longevity story rested on three pillars: (1) lifespan extension in yeast and short-lived mice; (2) SIRT1 activation as the mechanism; (3) "CR mimetic" framing. By 2026, all three are weak.
The yeast and high-fat-diet mouse findings [19,70] were partially contested by replication failures in flies and worms [71] and by null lifespan effects on standard diets. The ITP tested resveratrol and failed to extend lifespan in genetically heterogeneous mice [3].
The SIRT1 mechanism itself has been substantially revised. The original Sirtris fluorophore-coupled assays were shown to be susceptible to false positives. A 2025 GRADE-assessed meta-analysis of randomized trials concluded that resveratrol supplementation does not significantly influence human SIRT1 levels [4]. The clinical Sirtris program acquired by GSK for $720M produced no successful product and was discontinued.
Resveratrol is the canonical Mostly hype anchor in this synthesis precisely because it represents a complete failure trajectory: striking initial findings, failed at the highest replication tier, mechanism contested, commercial program collapsed.
Yadav et al. 2023 [5] generated enormous attention: ~10–12% mouse lifespan extension, plus rhesus monkey biomarker data and observational human associations. Supplement industry adoption was rapid.
In June 2025, a follow-up Science paper [6] found that taurine does not reliably decline with age in healthy individuals across multiple cohorts. The age-related-decline premise — the entire rationale for supplementation — was undermined within two years of the original paper.
The original mouse lifespan finding has not been retracted, and the 10–12% effect is real within the limits of the original study. But the case for human supplementation rests on a premise that the field's own follow-up did not confirm. Without independent ITP-grade replication of the mouse finding (none published as of this writing) and with the human age-decline premise contested, taurine sits in Suggestive band with significant downgrade risk.
This trajectory has played out for resveratrol, NR, taurine, and (to a degree) NMN. The methodology's threshold for elevating verdicts above Suggestive on the basis of single-lab work exists precisely to insulate the synthesis from this pattern.
Spermidine has stronger preclinical evidence than most polyphenols and a coherent mechanism (autophagy induction). Observational dietary-intake data is consistently positive on aging biomarkers and cognition.
The largest randomized human trial — SmartAge, n=100, 12-month, double-blind, placebo-controlled, in older adults with subjective cognitive decline — was null on its primary cognitive endpoint (mnemonic discrimination performance) [44]. Some secondary endpoints showed signals; the field's response has emphasized those rather than the primary.
The methodology says: a null primary endpoint in the largest trial of an intervention is the dominant signal. The verdict for spermidine is Suggestive rather than Probable specifically because of SmartAge.
The popular framework rests on two conjoined claims: (1) VO2max / cardiorespiratory fitness is among the strongest predictors of all-cause mortality; (2) Zone 2 training is the optimal protocol for raising VO2max and mitochondrial capacity.
Claim (1) is well-established and survives the methodology — VO2max-mortality cohort meta-analyses pool ~21 million person-observations [37]. Claim (2) does not survive. When training volume is matched, higher-intensity protocols produce equal or greater mitochondrial adaptations than Zone 2 [38]. Zone 2's advantages are practical: lower injury risk, lower recovery cost, sustainable at high weekly volumes. These are real, but they are not the same as "Zone 2 is mechanistically optimal."
Verdict: Probable for the broader CRF-mortality framework; Mixed for the specific Zone-2-is-optimal claim.
Several interventions in the Suggestive and Mixed bands share a methodological pattern that the popular discourse routinely under-weights: the literature on the intervention is dominated by one research group, often with commercial entanglement, producing consistent positive findings without independent replication. Examples:
Single-group dominance is not by itself disqualifying. It is, however, a flag for caution. The longevity field has a documented history of single-group striking results that fail to replicate (Sirtris's resveratrol program is the textbook case). When the popular framing of an intervention rests on the work of one research group, the appropriate calibration is one band lower than the within-that-group evidence would suggest.
The mirror image of "what does the discourse get wrong" is "what is underweighted relative to its evidence."
Sauna / heat exposure. The Kuopio Ischemic Heart Disease Risk Factor Study — a 2,300-person, 20-year prospective cohort — shows a clean dose-response between sauna frequency and cardiovascular / all-cause mortality, with men using a sauna 4–7 times per week showing ~50% lower fatal CV event rate than once-weekly users [33]. Sauna is a Probable-band intervention with stronger observational evidence than most pharmacological interventions in the same band, yet receives a tiny fraction of discourse attention.
Acarbose (covered in §4.2). The single most underappreciated drug in the longevity discourse relative to its ITP evidence.
Resistance training as longevity intervention. Universally framed as "fitness" or "strength" — categories adjacent to but not the same as longevity. The mortality cohort meta-analyses are clear: resistance training reduces all-cause mortality 10–20% independent of aerobic exercise, and provides unique falls/sarcopenia protection [22,23]. For older adults, it is among the highest-leverage interventions available, and it is also one of the cheapest.
GLP-1 agonists + resistance training. The discourse around GLP-1s currently treats muscle loss as either a negligible side effect or a deal-breaker. The mitigation — adding structured resistance training during weight loss — is well-supported by the broader caloric-deficit literature and is the obvious adjunct intervention.
Sleep treated as a peer of exercise. Sleep duration shows a U-shaped mortality curve at population scale rivaling exercise's effect size [24,25]. Yet sleep is routinely framed as a "lifestyle factor" rather than a peer of exercise as a Strong-band intervention. The peer-treatment is appropriate.
This is a single-author synthesis. The literature search is English-language. Citation-following is dense for high-priority interventions and less exhaustive for niche ones (P3 backlog items). The verdict bands are coarse; an honest synthesis cannot reduce the heterogeneity of "is this drug worth taking" to six categories without losing information. The author's calibration anchors are themselves contestable; a different reader might choose different anchor verdicts and produce a different ranking.
The most important author-side limitation: the anchor verdicts themselves embed prior beliefs. The choice to lock rapamycin at Probable (rather than Suggestive or Strong) constrains every downstream verdict. Different anchor choices would shift the global ranking. The methodology partially mitigates this by triangulating each anchor against published positions of named external authorities (Kaeberlein commentary, ITP results, Cochrane reviews), but this is not the same as independence. A reader who disagrees with the anchors should expect the entire ranking to shift; the methodology forces transparency about which anchor disagreement drives any specific re-rating.
The methodology was AI-assisted, as disclosed in §2.10. A non-trivial fraction of the day-of literature retrieval was performed by an LLM with web-search tool use; humans verified citations on a sampled audit basis, not on every claim. A skeptical reader should treat this synthesis as a hypothesis-generating exercise that can be falsified by checking citations against the underlying literature. The full database is open precisely to enable this audit. The most likely failure mode of LLM-assisted synthesis — fabricated citations or subtly wrong dates and effect sizes — is detectable but only by direct citation check; we encourage adversarial readers to do exactly that and to flag errors via the project's open issue tracker.
Three further limitations worth naming explicitly:
Future work. The immediate next steps are: (1) deposit the locked methodology and anchor verdicts on the Open Science Framework with a DOI to make pre-commitment externally verifiable; (2) solicit pre-publication review from independent longevity researchers (Kaeberlein team, Stanfield, Cochrane reviewers in adjacent fields) as a manual replication of the verdict assignments; (3) extend the synthesis to interactions and combination protocols, where the existing four-page treatment is preliminary; (4) treat this paper as a calibration baseline and re-run the methodology in 12 months as a within-method test of stability versus drift.
All data underlying this synthesis are openly available.
methodology.md)
— frozen 2026-04-24, no edits since; available in the open database
repository.CALIBRATION_ANCHORS.md)
— frozen 2026-04-24, no edits since.interventions/*.md,
n=38) and interaction pages (interactions/*.md,
n=4) — each with full citation lists, source URLs, and triangulation
rationale.paper/verdict-table.csv) — all 38 interventions plus 4
interactions, with verdict, anchor, and one-line rationale.site/)
— HTML version of the entire database.The synthesis was produced with assistance from a Claude model (Anthropic), as disclosed in §2.10. The author thanks the NIA Interventions Testing Program for maintaining the open public record of mouse-lifespan testing that is foundational to this synthesis [2], the EQUATOR Network for the SANRA reporting framework [1], and the ICMJE for the AI-disclosure recommendations [20] applied throughout. No human reviewers contributed to this version of the manuscript; readers wishing to flag errors or propose verdict changes are directed to the open issue tracker in the project repository.
None. This synthesis was conducted at the author's own expense, without grants from any funding body.
The author declares no financial or non-financial competing interests with respect to any of the interventions discussed. The author owns no equity in any company commercializing any intervention covered. The author has not received consulting income, honoraria, or speaking fees from any longevity-related commercial entity. The author has not used any of the off-label-prescribed interventions discussed (rapamycin, acarbose, GLP-1 agonists, etc.) and is not a candidate population for any of them at the snapshot date.
If a reader takes nothing else from this paper, take this hierarchy:
The corollary: misdirection of attention has a real opportunity cost. Time spent reading podcasts about resveratrol is time not spent training, sleeping, or evaluating an actually-evidence-supported drug class with a clinician.
The methodology that produced these verdicts generalizes beyond aging. The pattern — pre-commit the rules, lock calibration anchors, apply mechanically to a backlog, audit drift — is applicable to any evidence-rich field with a commercial or ideological overlay distorting the public discourse: dietary interventions, wellness technology, education research, criminal justice reform, productivity advice. Each of these has the same shape: lots of papers, weak synthesis, loud commercial voices, public belief that diverges from highest-tier evidence in identifiable ways. The methodology presented here is general; this paper happens to apply it to aging because aging is the field where the misdirection costs the most.
The aging-intervention literature is not the bottleneck. Synthesis is.
See paper/verdict-table.csv for the machine-readable
form. The CSV contains all 38 interventions plus 4 interactions, with
verdict, anchor triangulation, and one-sentence rationale per row.
methodology.md
— locked methodology, frozen 2026-04-24CALIBRATION_ANCHORS.md
— five anchor interventions with locked verdictsinterventions/*.md
— 38 intervention pages, one file eachinteractions/*.md
— 4 interaction / antagonism pagesThe site/
directory contains a static-HTML rendering of all of the above.
| SANRA item | Where addressed |
|---|---|
| 1. Justification of importance | §1 (Introduction: the synthesis crisis) |
| 2. Statement of aims | §1, last paragraph (the thesis); abstract Methods |
| 3. Description of literature search | §2.9 |
| 4. Referencing | §2 throughout, with citation numbers; full References list |
| 5. Scientific reasoning / evidence levels | §2.3 (tiers); §2.4 (bands); §2.5 (anchors); §3 throughout |
| 6. Presentation of relevant endpoint data | §3 (verdict tables with effect sizes); §4 (subsection-level detail); Figures 1–2 |