Epigenetic Clock Calculator
Computes DNA-methylation aging clocks from a beta-value file. 25 clocks are available; GrimAge is the headline one.
Self-contained. Needs only pandas + numpy — no biolearn, torch, seaborn,
scipy, or network. Clock coefficients and references are vendored under data/
(extracted from the open-source biolearn library, trimmed to the CpGs the clocks
use, ~1 MB total). scripts/compute_clocks.py faithfully reimplements biolearn's
GrimageModel, LinearMethylationModel, and the DunedinPACE quantile-
normalization (with a numpy-only rankdata), reproducing biolearn's outputs for
all 25 clocks (verified to <0.005, i.e. rounding only).
The clocks (run --list-clocks for the live list)
| family | clocks | unit |
|---|---|---|
| GrimAge (2nd-gen, mortality) | grimagev1, grimagev2 | years (needs age+sex) |
| 1st-gen chronological | horvath, horvath2, hannum, lin, vidalbralo, weidner, garagnani, bocklandt | years |
| 2nd-gen biological age | phenoage, hrsinchphenoage | years |
| Ying 2022 (causality) | yingcausage, yingdamage, yingadaptage | years |
| stochastic | stoch, stocp, stocz | years |
| tissue-specific | pedbe (pediatric buccal), cortical (brain) | years |
| 3rd-gen pace of aging | dunedinpace, dunedinpoam | years/year |
| other markers | dnamtl (telomere kb), zhang (mortality), epitoc1 (mitotic) | varies |
Group aliases for --clocks: all, grimage, core (default), firstgen,
secondgen, thirdgen.
dunedinpace needs ~20k background CpGs for its quantile normalization (not
just its 173 model CpGs). On a sparse input it self-imputes the missing background
from the gold-standard reference; if its reported coverage is well below ~90% the
result is dominated by the reference and unreliable — say so. A full EPIC/450K
export covers it fine.
Key facts about GrimAge (so you interpret it correctly)
GrimAge is second-generation, mortality-trained: it estimates DNAm surrogates
of 7 plasma proteins + smoking pack-years (V2 adds DNAm A1C & CRP), then combines
them with chronological age and sex in a survival model. Age and sex feed the
formula directly, so both are required for any grimage* clock. The other
clocks don't need them, but passing --age lets the tool report acceleration.
Workflow
1. Inspect the input file
Auto-detected layouts: Long (two columns: CpG id + beta) or Matrix (first column CpG id, remaining columns = samples). Betas are floats in [0,1]. Note the row count; coverage of each clock's CpGs is reported per clock.
2. Get age and sex
Required for GrimAge. If only an age range is known, use the midpoint and run
--sensitivity so the user sees how much the answer depends on the exact age.
3. Check dependencies (only pandas + numpy)
python3 -c "import pandas, numpy" 2>/dev/null && echo OK || pip install pandas numpy
No venv, biolearn, torch, or network needed.
4. Run
python3 <skill-dir>/scripts/compute_clocks.py \
--input "<betas.csv>" --age <years> --sex <m|f> \
--clocks all # or: core (default), grimage, firstgen, secondgen, or specific keys
# --sensitivity 40 42 47 49 # optional, when exact age is uncertain
python3 <skill-dir>/scripts/compute_clocks.py --list-clocks # see all keys
The script imputes any missing clock CpGs from the bundled (trimmed)
data/sesame_450k_median.csv population reference, runs each model, and prints a
table of value, acceleration (for year-unit clocks), and coverage, plus a JSON
line for downstream use.
5. Report and interpret
Lead with GrimAge, then the comparison. Always convey these caveats:
- Open-source reimplementation, not the official calculator. Values track Horvath's official server / Clock Foundation closely but aren't a certified number. For a citable value, point to the Horvath DNAm Age calculator or a commercial provider.
- "Acceleration" here = clock − chronological age, a simple difference. The academic AgeAccel (residual vs. a same-age cohort) needs a population sample and can't be computed for one person — so +8 means "epigenetic-predicted age is 8 years above chronological age," not "8 years older than peers."
- Generations differ. 1st-gen (Horvath/Hannum/Lin/…) target chronological age and tend to land near the true age. 2nd-gen (GrimAge/PhenoAge) target health outcomes and predict mortality/aging better — they can diverge from true age by design. Don't read GrimAge as "looks N years old"; it's a risk score in years.
- Non-year clocks (
dunedinpoampace,dnamtltelomere kb,zhang/stoczmortality risk,epitoc1mitotic) are not ages — no acceleration is shown. - Coverage / imputation: a few missing clock CpGs filled from a population
median is normal; flag clocks whose coverage drops below ~90%. The output's
n_lowconfcolumn counts imputed CpGs whose blood-reference SD > 0.08 (fills to distrust) — a per-CpG signal sharper than coverage alone. A clock with manyn_lowconffills is unreliable on this sample even if it "ran". - Blood-only design. This skill assumes human WHOLE BLOOD. Clocks are applied on blood; the imputation reference is blood. Do not use it on other tissues.
- Imputation of missing CpGs — methyLImp by default. Missing clock CpGs (e.g.
EPIC-trained CpGs absent on an MSA/WeGene export) are imputed with methyLImp:
reduced-rank (PCA) regression that predicts each missing CpG from the sample's
observed CpGs using the inter-CpG correlation structure of a whole-blood panel
— more accurate than a flat median (~30% lower RMSE when tissue matches). CpGs
the panel lacks fall back to the blood median;
n_lowconf(blood SD>0.08) flags unreliable fills. The active mode is printed at run start.- The panel lives in
data/blood_panel.npzand is bundled (~5 MB; prebuilt from GSE40279 = 656 whole-blood 450K samples, reduced to the clock CpGs + 50 blood PCs + per-CpG median/SD). So methyLImp is active out of the box — it cuts imputation RMSE ~10% vs a flat median on a held-out-CpG benchmark. To rebuild/customize, runscripts/build_blood_panel.py. If the file is ever missing the tool degrades gracefully to global-median and says so. - Reality check: methyLImp mainly improves the heavily-imputed clocks. High- coverage clocks (GrimAge/Horvath/PhenoAge) impute few CpGs, so their values move <0.2 yr regardless — the confidence flag is the bigger practical win.
- The panel lives in
- Not medical advice — research/educational use only.
Notes
- Matrix input → one result row per sample per clock.
- Deliberately excluded (can't be reproduced faithfully without heavier
machinery, or aren't aging clocks):
- PC-clocks (
PCHorvath…),AltumAge,GPAge— need PCA rotation / neural nets. - Gestational clocks (Knight, Lee, Mayne, Bohlin) — for cord blood / newborns.
- Trait & disease predictors (BMI, cholesterol, smoking, alcohol, Alzheimer's, CVD, …) — these are biomarker models, not aging clocks. If a user specifically needs one of these, tell them it requires the full biolearn install.
- PC-clocks (
- Provenance. Coefficients are biolearn's (MIT), which reimplements the published clocks. GrimAge has commercial-use restrictions (UCLA TDG / Clock Foundation) for cosmetics & life-insurance use.