Genome-Wide Association Study Identifies Genetic Associations with Perceived Age

Failure of dermal protection or repair mechanisms might lead to visibly aged skin. The study aimed to identify genetic associations with perceived age. A genome-wide association study was undertaken in 423,992 adult participants of UK Biobank, using questionnaire data on perceived age and genetic data imputed to the Haplotype Reference Consortium imputation panel. The study identified 74 independently associated genetic loci, to our knowledge previously unreported (P < 5 × 10−8), which were enriched for cell signaling pathways, including the NEK6 and SMAD2 subnetworks. Common genetic variation was estimated to account for 14% of variation in perceived age, and the heritability of perceived age was partially shared with that of 75 other traits, including multiple traits representing adiposity, suggesting that perceived age may be a useful proxy trait in genetic association studies.


Supplementary Text 1: Estimation of effective sample size.
The number of participants who self-reported each category of the outcome are reported in the main text. As there is an uneven split in the 3 categories the effective sample size will be smaller than the total number of participants. To help understand the impact of this imbalance on statistical power, we estimated the effective sample size of the experiment as follows.
First, we note that in a conventional case control study, the number of cases is given by Nfull cases*1 + Ncontrols*0. Because we considered people who looked about their age to have an intermediate phenotype between a full case and full control, we coded them as 0.5. We therefore reasoned that the effective number of cases could be given as Nfull cases*1 + Nhalf cases*0.5 + Ncontrols*0.
We used the same approach to estimate the effective number of controls, and finally the effective sample size Neff = 4/(1/Ncases+1/Nctrls) (Willer et al., 2010).
Using this approach, we estimate there were 60,280 effective controls and 363,712 effective cases, giving an overall effective sample size of 206,839 participants.

Methods
Simulations were undertaken using the 'simulateGP' package (https://github.com/explodecomputer/simulateGP) in the statistical language R (version 3.5.3, 2019 release). First, 100 genotypes were simulated with minor allele frequency of 0.3 for 423,992 participants. Of these genotypes, 50 had a causal effect on the simulated phenotype, and 50 had no effect. Participant age was simulated from a rectangular distribution between ages 30 and 71. Next, an underlying continuously distributed phenotype was generated representing liability for youthful appearance, affected by both participant age (explaining 50% of variation in the phenotype), the 50 causally related SNPs (explaining collectively 10% of variation in the phenotype) and randomly distributed unmeasured environmental and genetic factors (explaining 40% of variation in the phenotype). Next, underlying liability was altered with the addition of between 0% and 90% random error in 10% increments, i.e. up to 90% of variation in the latent liability variable was now due to additional random noise. At each threshold of noise, the latent phenotype was used to derive a new categorical phenotype with 8,630/103,300/312,062 participants in the respective categories. Next, each derived categorical phenotype was regressed on each causal and non-casual genotype using a linear regression model incorporating adjustment for age. The resulting estimates of genetic effect were flipped where necessary, so the estimates were always positively signed, and then converted to a log odds ratio using the same Taylor expansion series used in the main analysis.

Results
With increasing measurement error, the estimates of genetic effect at truly associated variants are biased towards the null, with an approximately log-linear relationship between odds ratio and percentage of measurement error. With finite statistical power, this means the association with some variants which was detectable in the baseline model is no longer detectable, and the type II error rate of the experiment therefore increases with increasing measurement error.
Conversely, the type I error rate is not inflated by this form of measurement error representing the association on a log-odds ratio scale between a single simulated genetic variant and self-reported simulated perceived age. The true effect of each SNP is shown in the baseline model (x=0). Variants which have a detectable non-null effect in the true model are colored in cyan, and variants which have a null association in the true model are colored in orange. With increasing measurement error there is a log-linear attenuation in effect sizes away from the true effect (represented by x=0) towards the null for variants with a true effect (cyan regression line), meaning the true associations are no longer detectable for some variants. With increasing measurement error there is no inflation in effect sizes at truly null variants (orange regression line).