If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Cancer Control Group, Population Health Department, QIMR Berghofer Medical Research Institute, Brisbane, Queensland, AustraliaCancer Research UK Manchester Institute and Institute of Inflammation and Repair, University of Manchester, Manchester, UK
Cancer Control Group, Population Health Department, QIMR Berghofer Medical Research Institute, Locked Bag 2000, RBWH, Brisbane, Queensland 4029, Australia.
Identifying people at high risk of melanoma is important for targeted prevention activities and surveillance. Several tools have been developed to classify melanoma risk, but few have been independently validated. We assessed the discriminatory performance of six melanoma prediction tools by applying them to individuals from two independent data sets, one comprising 762 melanoma cases and the second a population-based sample of 42,116 people without melanoma. We compared the model predictions with actual melanoma status to measure sensitivity and specificity. The performance of the models was variable with sensitivity ranging from 97.7 to 10.5% and specificity from 99.6 to 1.3%. The ability of all the models to discriminate between cases and controls, however, was generally high. The model developed by MacKie et al. (1989) had higher sensitivity and specificity for men (0.89 and 0.88) than women (0.79 and 0.72). The tool developed by Cho et al. (2005) was highly specific (men, 0.92; women, 0.99) but considerably less sensitive (men, 0.64; women, 0.37). Other models were either highly specific but lacked sensitivity or had low to very low specificity and higher sensitivity. Poor performance was partly attributable to the use of non-standardized assessment items and various differing interpretations of what constitutes “high risk”.
Abbreviations
AUC
area under the curve
ROC
receiver operating characteristic curve
INTRODUCTION
Accurate identification of people at high risk of melanoma is important for both prevention and early detection. In terms of medical surveillance, although the efficacy of routine skin screening in people of average risk remains unproven (
) recommend screening subsets of the population determined to be at high risk. A number of tools have been developed to assist this targeted screening, but recent reviews (
Models tend to perform better when the distributions of melanoma risk factors in the new population are within the ranges seen in the development population (
). To assess the validity of a risk model in an independent population, one first calculates a risk score for the individuals in the new population using the parameters from the specified risk model. These risk scores are then compared with the actual status of each individual to assess how well the model discriminates those with the condition from those without. There are three different elements of validity: (i) agreement between the observed and predicted probabilities of outcomes (calibration), (ii) ability to distinguish subjects with different outcomes (discrimination), and (iii) ability to improve the decision-making process (clinical usefulness). We assessed the performance of six melanoma risk prediction tools, applying the risk metrics defined by the model developers, using two independent Australian data sets: a series of melanoma patients (the Epigene Study (
Table 1 shows a summary of the risk prediction models. The number of variables included in the models ranged from 4 to 10. Only one model developed in an Italian population had been externally validated (in a Brazilian population;
Age Sex State Total common nevi Atypical nevi Freckles Hair color Family history of melanoma Personal history of melanoma Personal history of keratinocyte cancer
Two risk prediction models used arbitrary absolute risk cut-points, defining “high risk” individuals as those with a 5-year absolute risk greater than 0.15% (
) defined “high risk” individuals using cut-off points on the receiver operating characteristic (ROC) curve to maximize sensitivity and specificity. For another model (
), a “high risk” individual was defined as having three or more risk factors present or one major risk factor (“presence of more than 20 nevi on the arms” for people aged below 60 years and “presence of freckles” for patients aged 60 years and above). The fifth model defined “very high risk” individuals as those with a relative risk over 10 (
Characteristics of the case (Epigene) and control (QSkin) populations are provided in Supplementary Tables 1 and 2, online respectively. The mean age of cases was 58 years and controls 56 years.
Sensitivity, specificity, and area under the ROC curve
The performance of the models in the validation data sets using the thresholds defined by the original investigators was variable. Sensitivity of the models ranged from 0.98 to 0.11 and specificity from 1 to 0.01. The model developed by
Table 2Sensitivity, specificity, and receiver operating characteristic curve areas (AUC) of six melanoma risk prediction models tested in two independent data sets
). Factors contributing to these outcomes included the risk factors included in the models (Table 1), the magnitude of the risks ascribed to those factors (Supplementary Tables 3–8 online), and the prevalence of those risk factors in the two independent populations (Supplementary Tables 1–2 online). For the model by
), the median risk scores in the case and control data sets were 218 and 42, respectively, but using the authors’ defined cut-point of 3 placed a large proportion of both cases and controls in the high risk category. Similarly, for the model by
), the mean estimate of 5-year melanoma risk was 0.49 and 0.71 in the control and case data sets, respectively, with a cut-point for high risk of 0.15. Conversely, the model by
) correctly classified 99% of controls as low risk but showed lower sensitivity; the median 5-year melanoma risk was 0.006 in the control data set and 0.08 in the case data set (again with a cut-point for high risk of 0.15).
) was more sensitive among those aged ≥60 years (Supplementary Table 9 online). Model specificity did not vary widely between the two age-groups. There were no appreciable differences in the model performance for men and women separately (data not shown).
Using a common threshold of specificity, the model with the highest sensitivity was the model by
) had the lowest sensitivity at thresholds of 80 and 90% specificity; this model included a measure of “severe solar damage” to the skin (for men only), which was not available in the Epigene or QSkin data sets.
Table 3Sensitivity of six melanoma risk prediction models tested in two independent data sets using two different thresholds of specificity
Overall, the discriminatory performance of the models was high, with area under the curve (AUC) values ranging from 0.73 (95% confidence interval (CI) 0.71–0.75) for the model by
); Table 2; Figure 1). For the two models that had presented AUC values for their model derivation data sets, the AUC values were slightly higher in our validation data set (0.80 vs. 0.71 for the model by
Figure 1Receiver operating characteristic (ROC) curve and corresponding area under the curve (AUC) statistics for the risk score of six melanoma risk prediction models in the validation cohort.
When the Hosmer–Lemeshow goodness-of-fit test was used to quantify the overall fit of the models to the data, a high P-value for the model developed by
; P=0.99) indicated an excellent agreement between the observed and the expected number of cases; for all other models, the P-value was <0.001. The calibration curves are presented as Figure 2.
Figure 2Calibration plots for six melanoma risk prediction models in the validation cohort.
When the accuracy of each model was assessed for classifying participants’ risks of melanoma according to categories as defined by the authors (Table 4), results differed substantially. Although the models by
) correctly assigned high or very high risk classifications to more than 90% of melanoma cases, the latter two also assigned similarly high classifications to more than 80% of population controls.
Table 4Classification accuracy of six melanoma risk prediction models tested in two independent data sets using cut-points as defined by the model developers
Using alternative skin phototype variables (burning/tanning) in lieu of skin color made no appreciable difference to the sensitivity or specificity of the models by
; data not presented). Substituting a tanning variable in place of a burning variable for skin type increased the sensitivity and specificity of the model by
; sensitivity 0.95 for women and 0.83 for men; specificity 0.56 for women and 0.68 for men). Substituting these alternate variables in the ROC analyses did not materially change the AUC for any of the risk prediction models.
DISCUSSION
There has been increasing interest in developing risk prediction tools to assess disease risk and prognosis. Models predicting risks of common cancers have been developed (
Development of a nomogram that predicts the probability of a positive prostate biopsy in men with an abnormal digital rectal examination and a prostate-specific antigen between 0 and 4 ng/mL.
). They have been reasonably successful at identifying groups at higher and lower risks of developing cancer but have, in general, shown a limited ability to accurately discriminate between those who do and do not develop cancer. Our analyses suggest that, when using the cut-points defined by the model developers, the sensitivity and specificity of published melanoma risk prediction models are relatively low when tested in independent samples. This indicates that the models were poorly calibrated for predicting melanoma risk in the Australian population; however, the ability of the models to discriminate between cases and controls was generally high in our combined data sets.
Published melanoma risk prediction models are heterogeneous in terms of the design of the studies used to develop the models, how predictor variables were defined and included, and how patients at “high risk” were categorized and defined. Our sensitivity analyses demonstrate the impact of using slightly different predictor variables on the sensitivity and specificity of some models. The variation in model performance seen after apparently subtle changes in terms underscores the need to establish standardized definitions of these characteristics across studies.
In evaluating the performance of the models, we considered the trade-off between sensitivity and specificity. As we were interested in each model’s capacity to stratify people into clinically relevant risk categories, models that placed more people at the extremes of the risk distribution would have the advantage of facilitating decision-making around targeted surveillance (with fewer people in the “middle” categories where there may be uncertainty about the appropriate advice). The model with the highest sensitivity and specificity when applied to our data sets using the cut-points defined by the model developers was the model by
), developed in a case-control study and incorporating only four variables (sex, total nevi, atypical nevi, and freckling tendency). The next best performing models were those developed by
), with eight predictor variables, was developed in a case-control study. In evaluating the differential performance of these models in our data sets, it was difficult to disentangle the independent effects of the inclusion/exclusion of risk factors in the model and their effect sizes. In general, the highest relative risks reported were for high nevus counts, both common and atypical (Supplementary Tables 3–8 online).
The discriminatory ability of the melanoma risk prediction models in our combined data set was generally high with AUC between 0.73 and 0.93. To put these findings in the context of models for other cancers, a recent meta-analysis reported a summary C statistic (equivalent to the AUC in ROC analyses) for the most well-known model for predicting risk of breast cancer, the model by
Notable findings of our evaluation of melanoma risk prediction tools were the marked between-study heterogeneity in the sources and collection of data, the factors considered for inclusion in the models, the categorization of these risk factors, and the choice of analytic model. Strengths of our analyses include the large sample size of our case and control groups. All cases were histologically confirmed. Limitations include possible misclassification of some exposures as a result of the harmonization of risk factor data. Our melanoma case group included only melanomas of the trunk, head, and neck and did not include cases with melanoma on the limbs. If the risk profile of people with limb melanomas differed greatly from that of other body sites our evaluation of the discriminatory accuracy of the prediction models may be inaccurate, but evidence suggests that this is not the case (
We did not evaluate risk prediction models that incorporated genetic predictors as such data were not available for our validation data sets. Potentially, incorporating genetic information into risk prediction models could identify people as high risk who might not be so identified on the basis of their phenotypes or exposures alone, although evidence for this is conflicting. Several studies have reported that including the terms for genotype in addition to the core predictive factors gave no useful improvement in prediction (
). A recent study found that including information about MC1R and previously unreported common genomic variants increased the predictive capacity of the model by 17% over the nongenetic model (
In summary, a large number of prediction models have been developed for melanoma, ostensibly to aid clinicians (and their patients) to quantify individual risk and develop appropriate management plans. To be useful, a prediction tool must be accurate, generalizable, and clinically effective (
). We found that most existing melanoma prediction models were poorly calibrated, at least when the models were applied to an independent Australian data set using the originally published cut-off scores to estimate risk. However, most models identified higher proportions of melanoma cases compared with disease-free controls across the range of all possible cut-off scores and thus had reasonable discriminatory accuracy. These findings suggest that models need to be calibrated specifically for target populations, using population-specific cut-off scores, before they can be used in clinical practice. We therefore echo the plea for more efforts to validate risk prediction tools and encourage the adoption of reproducible, standardized measures of predictor variables that can be easily recorded in the clinical setting (
Risk factors for developing cutaneous melanoma and criteria for identifying persons at risk: multicenter case-control study of the central malignant melanoma registry of the German Dermatological Society.
Risk factors for presumptive melanoma in skin cancer screening: American Academy of Dermatology National Melanoma/Skin Cancer Screening Program experience 2001-2005.
The Epigene data set included information for 762 incident cases of first primary melanoma diagnosed between 1 April 2007 and 30 September 2010. The methods of this case–case study have been described previously (
). All cases were aged 18–79 years and were residents of the greater Brisbane region, Queensland, Australia. Cases were ascertained prospectively through pathology companies servicing the region. The study did not recruit patients with melanomas on the limbs. Participants completed a detailed questionnaire and underwent a clinical examination by a dermatologist. The questionnaire captured information about demographic, phenotypic, medical, and other risk factors including sun exposure history. The clinical examination recorded hair and eye color and numbers of solar keratoses and melanocytic nevi present across all body sites.
QSkin
The QSkin Study comprises a cohort of 43,781 men and women aged 40–69 years randomly sampled from the population of Queensland, Australia, in 2011 (
; overall participation fraction 23%). At baseline, information about demographic items, general medical history, standard pigmentary characteristics (including hair and eye color, freckling tendency, tanning ability and propensity to sunburn), past and recent history of sun exposure and sunburns, sun protection behaviors, use of tanning beds, and history of skin cancer was collected by self-completed questionnaire. Participants gave their consent for data linkage to cancer registries ensuring complete ascertainment of all melanoma occurrences (notification of cancer to Australian Cancer Registries has been legally mandated since 1982). For these analyses, we restricted the data set to 42,116 participants with no prior history of melanoma.
Data harmonization
We matched variables from the Epigene and QSkin data sets (the “validation datasets”) as closely as possible to those described in each model (“derivation dataset”, Supplementary Tables 3–8 online), irrespective of their later performance in the model. Where more than one variable in the validation data sets was a possible match for the corresponding variable from the derivation data set, we performed sensitivity analyses using the alternative variables. For example, in the Epigene data set, we had two freckling variables. In our primary analyses, we used the categorical measures of freckling on the face but performed sensitivity analyses using freckling on the shoulders. For skin phototype, both the Epigene and QSkin data sets included separate variables for tanning and burning responses. Where skin phototype was included in the model, we used burning response for our primary analyses, but we performed sensitivity analyses using the variable for tanning response. The Epigene data set did not include a variable for skin color, and we instead used burning response performing sensitivity analyses using tanning response.
Statistical analyses
For each risk prediction tool, we applied the fully specified model and calculated a risk score for each person in the two validation data sets, applying the risk metrics (as defined by the model developers) to classify people into risk categories (Supplementary Appendix 1 online). We then compared these predictions with actual outcomes, examining sensitivity in the melanoma cases (Epigene data set) and specificity in the disease-free controls (QSkin data set). For each analysis, we excluded individuals with missing data for one or more predictor variables. The number excluded varied depending upon the variables specified in each model and ranged from 18% (model by
). We conducted these analyses for all study participants and also stratified by age (<60/⩾60 years); 48% of cases and 65% of controls were aged <60 years.
We examined the classification accuracy of each model using categories as defined by the authors of each model. For the single model that did not define a high risk cut-point (
). We investigated reasons for high and low sensitivity and specificity by examining factors included in the models, their effect sizes, and the prevalence of those risk factors in the two independent populations. Our calculations of sensitivity and specificity were dependent upon the choice of cut-point used to define “high risk” by each model. We also calculated the sensitivity of each model using a common threshold of specificity (80 and 90%). Area under the ROC curve is a plot of (1-specificity) of a test against its sensitivity for all possible cut-off points (
). We therefore combined the two data sets and evaluated the predictive discrimination of each model using the area under the ROC curves. We tested the calibration of each model using the Hosmer–Lemeshow goodness-of-fit test (
), where a small P-value indicates departure from goodness-of-fit. We used SAS PROC LOGISTIC (V.8.02; SAS Institute, Cary, NC) to calculate the area under the ROC curves.
ACKNOWLEDGMENTS
This work was supported by Program Grant 552429 and Project Grant 442960 from the National Health and Medical Research Council of Australia (NHMRC). DCW, REN, and PMW are supported by Research Fellowships from the NHMRC. These analyses used epidemiologic data from the Epigene Study and the QSkin Study, for which DCW was the Principal Investigator. Chief Investigators of the Epigene Study are DCW, ACG, Richard Williamson, Dominic Woods, Joe Triscott, and Rohan Mortimore. Marina Kvaskoff and Nirmala Pandeya cleaned the Epigene data and derived summary variables. Chief Investigators of the QSkin Study are DCW, CMO, ACG, REN, and PMW. We are also grateful to the Queenslanders who have willingly given their time to take part in the QSkin and Epigene Studies.
Development of a nomogram that predicts the probability of a positive prostate biopsy in men with an abnormal digital rectal examination and a prostate-specific antigen between 0 and 4 ng/mL.
Risk factors for developing cutaneous melanoma and criteria for identifying persons at risk: multicenter case-control study of the central malignant melanoma registry of the German Dermatological Society.
Risk factors for presumptive melanoma in skin cancer screening: American Academy of Dermatology National Melanoma/Skin Cancer Screening Program experience 2001-2005.