Abstract
Absolute risk prediction models are ideally constructed using prospective studies. However, casecontrol design is a popular and powerful approach to epidemiology studies where prospective studies are not feasible due to ethical concerns or limited sample size resulting from budget or time concerns.
In particular, many GWAS (genome-wide association studies) are performed using case-control design. Based on a case-control study and age-specific population incidence rates, we proposed a two-stage estimation procedure to build an absolute risk prediction model. We illustrate it in building absolute risk prediction models for never-smoking women in Taiwan, noting that predicting lung cancer among never-smokers is hard, because of no existing dominating risk factors. We formed an age-matched case-control study (AMCCS) of never-smoking female lung cancer, based on GELAC (Genetic epidemiology of lung adenocarcinoma in Taiwan) and Taiwan Biobank datasets. There are 1748 agematched groups involving 1748 cases and 6535 controls. The first stage uses the AMCCS to estimate the odds ratios of risk factors other than age. We also used Taiwan Cancer Registry, Taiwan Cause of Death Database, Taiwan life-tables before 2011 to construct an age-specific six-year population incidence rate of lung cancer among Taiwan never-smoking females (ASSIR). The second stage uses the results from first stage, AMCCS, and ASSIR to estimate the age-effect and thus obtain the absolute risk prediction model. A bootstrap method was utilized to compute the standard error of this two-stage procedure. This method is used to propose the risk prediction model TNSF-SQ published in Cancer Epidemiology, Biomarker, and Prevention. In this study, training set consisted of participants recruited earlier than those in validation set. TNSF-SQ demonstrated good discriminative power. The fact that 27% of never-smoking female lung cancer patients aged 55—74 in Taiwan had 6-year risks higher than 0.0151 according to TNSF-SQ suggests that the model is potentially useful, in view that only 36.6% of female lung cancer in USA were eligible for LDCT screening, where tobacco smoking is the main cause of lung cancer and screening is recommended according to USPSTF 2013 smoking criteria or PLCO model 6-year risk higher than 0.0151.