AIRCARE (Air Pollution and Cancer Research Ecosystem): Center for Advanced Research on Environmental Health and Lung Cancer Risk (AIRCARE)

April 27, 2026 updated by: Abhishek Shankar
In India, lung cancer is the 2nd most common in males and 4th overall in cancer incidence with 81,784 new cases and 75,031 deaths with a 5-year prevalence of 1,13,990 as per GLOBOCAN 2022. Air pollution, particularly fine particulate matter (PM2.5), has been identified as a significant risk factor for lung cancer in never-smokers. India is showing an increasing incidence of lung cancer and there is a need to understand air pollution given many cities in India have been reported to be the most polluted in the world. Evidence of causal associations between PM2.5 and an increased likelihood of lung cancer with underlying biological mechanisms are now fully known. Current evidence focuses on individual pollutants, overlooking potential interactions among multiple risk factors that could amplify lung cancer risks. There is paucity of data on vulnerability of groups like children, older adults, and individuals with pre-existing health conditions, who may face disproportionate risks from poor air quality. The long-term effects of chronic exposure to air pollutants and their cumulative contribution to lung cancer risk remain understudied. Centre for Advanced Research on AIRCARE is essential to bridge these research gaps, providing a holistic understanding of air pollution's role in lung cancer to plan prevention and policy strategies. The study will be conducted at AIIMS, Delhi, and areas in Delhi and NCR with varying levels of PM 2.5 exposure and will encompass regions with diverse socio-economic profiles and industrial activities to capture the heterogeneity of exposure and risk. Subjects and controls will be enrolled to ensure a suitable representation of various demographic, socio-economic and air pollution exposure parameters. A subset of the cohort will be selected for genotyping, focusing on individuals with extreme exposure levels and/or lung cancer cases and controls for genetic interaction studies.

Study Overview

Status

Not yet recruiting

Conditions

Detailed Description

  1. To investigate the relationship between PM 2.5 and lung cancer risk, focusing on individual and cumulative effects along with the multiplicative interaction of air pollution with other risk factors for lung cancer. Methodology - Study Design A combination of a prospective cohort study and case-control study will be employed. The prospective cohort will establish the temporal relationship between air pollution exposure and lung cancer incidence. The case-control study will facilitate detailed exposure assessment and biomarker analysis within a subset of the cohort. The case-control design will also be utilized to further explore gene-environment interactions and detailed exposure assessments.

    Prospective and retrospective data from existing patient records, environmental monitoring stations, and the cohort will be used. The clinical workup of the enrolled participants will be done according to a standardised proforma (attached in additional documents) which will contain items relating to participant demographics, work and exposure history, medical history, tobacco history, alcohol history, family history, allergy history, prior history of chronic medical conditions and history pertaining to the nonsurgical treatment of chronic medical conditions. In a longitudinal cohort, participants will be followed over 3 years, with repeated exposure assessments and health outcome monitoring. Lung cancer cases diagnosed within the cohort will be matched to controls based on age, sex, and residential area. Study area The study will be conducted at AIIMS, Delhi, and areas in Delhi and NCR with varying levels of PM 2.5 exposure and will encompass regions with diverse socio-economic profiles and industrial activities to capture the heterogeneity of exposure and risk. The study will be conducted within the same geographical areas as the overall AIRCARE cohort study, encompassing diverse urban and peri-urban regions. The study area will have varying levels of air pollution (PM2.5, VOCs, NOx) and a diverse distribution of other risk factors (smoking, occupational exposures, genetic susceptibility) in Delhi and NCR with established air quality monitoring infrastructure, and different socioeconomic profiles. Sample size estimation and sampling strategy Power calculations will be performed to determine the required sample size to detect a clinically significant increase in lung cancer risk associated with air pollution exposure.

    Assuming an average prevalence of lung cancer of 1.5% in India and odds of at least 2 times for high PM-2.5 exposure above the normal range (>37 μg/m³,) and an alpha level of 0.05 with 80% power, we estimate enrolling approximately 3230 participants (1615 lung cancer cases and 1615 controls). We will use the diagnosed lung cancer cases from outpatient department and Delhi cancer registry (DCR) and control will be recruited from the family members of lung cancer patients to get the matched population in terms of PM 2.5 exposure. Project Implementation Plan Cases will include patients with lung cancer (smokers and non-smokers), taken from OPD and from DCR data set. Control will be family members of patients with lung cancer (smokers and non-smokers). The primary outcome will be measured in terms of histologically confirmed incidence of lung cancer. The interaction effects between air pollution (PM2.5) and other risk factors (smoking, Alcohol consumption, occupational exposures, genetic susceptibility) on lung cancer incidence and cumulative risk of lung cancer associated with combined exposure to air pollution and other risk factors will be evaluated. Secondary Outcomes will be measured with the impact of interaction effects on lung cancer histological subtypes, modification of lung cancer risk by genetic variants in the presence of air pollution exposure, changes in biomarker profiles associated with combined exposures, and synergistic effects of PM 2.5 and other risk factors. Design of statistical analysis Ambient air pollution exposure will be quantified using a combination of satellite data, ground-based monitoring, and individual exposure measurements with a measure of PM2.5 concentration. For the cohort study, cox proportional hazards models will be used to estimate hazard ratios (HRs) for lung cancer incidence and mortality associated with air pollution exposure. Time- dependent covariates will be included to account for changes in exposure over time. Survival analysis (Kaplan-Meier method) will be used to examine survival outcomes. For the case-control study, logistic regression will be used to estimate odds ratios (ORs) for lung cancer associated with detailed exposure assessments and biomarker levels. Analysis of the interaction between pollutants will be performed using interaction terms in the regression models. Sensitivity analyses will be conducted to assess the robustness of the findings to potential confounding factors and exposure misclassification. Dose-response relationships between PM 2.5 exposure and lung cancer risk will be examined using categorical and continuous exposure variables. Subgroup analyses will be conducted to examine the effect of air pollution on lung cancer risk in different demographic subgroups (e.g., age, sex, smoking status).

  2. To develop and validate a risk-based stratification model based on PM-2.5 exposure for lung cancer screening. Study Design This objective will involve development and validation phases. In the development Phase, we will utilize a retrospective cohort or a prospective cohort dataset, to develop a risk prediction model. This process will use existing data from the ongoing cohort study, or historical data from cancer registries and air quality monitoring stations. The developed model will be externally validated using an independent dataset from a geographically distinct population.

    This will have a new cohort or a separate dataset, that will be enrolled for prospective validation. Study area This will be conducted at AIIMS, Delhi, and areas with a diverse range of PM2.5 exposure levels in Delhi and NCR. We will require a geographically distinct region, within India, with independent data on PM2.5 exposure and lung cancer incidence for validation purposes. This region will have demographic and environmental characteristics that differ from the area being studied for lung cancer development, to ensure external validity. Sample size estimation and sampling strategy For model development, the sample size will be determined based on the number of lung cancer cases and controls available in the retrospective/prospective datasets. Assuming an average prevalence of lung cancer of 1.5% in India and odds of at least 2 times for high PM- 2.5 exposure above the normal range (>37 µg/m³,) and an alpha level of 0.05 with 80% power, we estimate enrolling approximately 3230 participants (1615 lung cancer cases and 1615 controls).

    Sampling Strategy:

    For model development, patients with lung cancer at Dr BR Ambedkar Institute Rotary Cancer Hospital at AIIMS, Delhi and Delhi cancer registry data will be used. Project Implementation Plan The primary outcome will be measured based on the performance of the risk stratification model in predicting lung cancer risk, assessed by area under the receiver operating characteristic (AUROC) curve, sensitivity and specificity at predefined risk thresholds, and using calibration plots, to assess the agreement between predicted and observed risks.

    The secondary Outcomes will be measured in terms of the positive predictive value (PPV) and negative predictive value (NPV) of the model, the impact of the model on lung cancer detection rates and stage distribution, and the model's ability to correctly categorize risk in prespecified subgroups. Design of statistical analysis In model development phases, logistic regression or machine learning algorithms (e.g., random forests, gradient boosting) will be used to identify predictors of lung cancer risk, including PM2.5 exposure, demographic factors, and other relevant variables. A risk prediction model will be developed using the selected predictors. The model's performance will be evaluated using internal validation techniques, such as bootstrapping. The developed model will be applied to the independent validation dataset. The model's performance will be assessed by calculating the AUROC, sensitivity, specificity, PPV, and NPV.

    Calibration plots will be generated to visually assess the agreement between predicted and observed risks. Decision curve analysis will be conducted to evaluate the clinical utility of the model. The model's performance will be assessed in predefined subgroups, such as age, sex, and smoking status. The performance of the PM2.5-based model will be compared with existing lung cancer screening guidelines. the findings to potential confounding factors and exposure misclassification. Statistical software packages such as R, SAS, and Python will be used for data analysis.

  3. To identify biomarkers for PM 2.5-associated lung cancer. Study Design In this study, we aim to identify unique molecular signatures induced by PM2.5 exposure using next-generation sequencing (NGS). The two groups utilised in this study will be that of smokers and non-smokers. The logical assumption being that both groups have an exposure to PM 2.5.

    The rationale for the design is centred upon the fact that lung cancer specific biomarkers detected in the smokers group will primarily be attributable to smoking, however, those detected in the non-smoker group may help in identifying a novel signature induced by PM 2.5 in developing lung cancer. Specifically, peripheral blood mononuclear cells (PBMCs) will be collected from both non- smokers and lung cancer patients to examine the impact of PM2.5 exposure. The PBMCs will be cultured in vitro, allowing for proper propagation before exposure to PM2.5 particles. For this experiment, PM2.5 will be sourced from the National Physical Laboratory. These fine particulate matter samples will be used to expose the cultured PBMCs to simulate real-world exposure conditions. After exposure, we will conduct targeted next-generation sequencing (NGS) to profile the transcriptomic changes within these cells. The key focus of this study is to identify differentially expressed genes (DEGs) triggered by PM2.5 exposure. The comparison will be made between the response of PBMCs from non-smokers and those from lung cancer patients. Mutations such as those pertaining to EGFR and ROS are known in literature to be associated with lung cancer. The approach being utilised in the present study design will allow for the generation of a distinct molecular signature associated with PM2.5 exposure, which may reveal specific genetic alterations or biomarkers relevant to both environmental exposure and lung cancer pathogenesis. The final objective is to develop a PM2.5-specific gene signature that could serve as a potential biomarker for exposure or disease progression. This study could provide crucial insights into how PM2.5 affects gene expression differently in healthy individuals and lung cancer patients, potentially contributing to our understanding of environmental influences on cancer development. Study area This will be done at the cancer center at AIIMS, Delhi, and will include the same geographical areas as the overall AIRCARE cohort study, with access to hospital facilities for lung cancer diagnosis and sample collection. Sample size estimation and sampling strategy Sample size will be taken based on feasibility using the above-mentioned cohorts. Peripheral blood samples from both non-smokers and lung cancer patients will be collected after ethical approval. PBMC isolation will be done and cell counting will be done using a hemocytometer or an automated cell counter. We will perform a cell viability assay using Trypan Blue or a similar reagent to ensure the viability is above 90%. PBMC culture will be set followed by PM 2.5 preparation and dispersion. PM 2.5 will be exposed to PBMCs. RNA from the collected PBMCs will be isolated using a reliable RNA extraction kit (e.g., Qiagen RNeasy Kit or similar).

    Targeted NGS will be performed. We will use appropriate bioinformatics tools to identify differentially expressed genes (DEGs) between the PM2.5-exposed and control groups and comparisons between the responses of PBMCs from non-smokers and lung cancer patients will be done to identify differences in gene expression. We will be able to generate PM 2.5 signatures after identifying genes that are consistently differentially expressed across both non-smokers and lung cancer patients. Project Implementation Plan Primary Outcome will be measured by identification of biomarkers that are significantly associated with air pollution-related lung cancer, and the predictive ability of the biomarkers to discern between cases and controls. Secondary Outcomes will be measured in terms of the association between biomarker levels and air pollution exposure levels, temporal changes in biomarker levels associated with lung cancer development, association between biomarker levels and lung cancer histological subtypes, usefulness of biomarker panels for early lung cancer detection, and ability of the biomarkers to predict treatment response. Design of statistical analysis Differential expression analysis (e.g., t-tests, ANOVA) will be used to identify biomarkers that are differentially expressed between lung cancer cases and controls. Machine learning algorithms (e.g., random forests, support vector machines) will be used to identify biomarker panels that can accurately discriminate between cases and controls. Pathway analysis will be used to identify biological pathways associated with the identified biomarkers. Receiver operating characteristic (ROC) curve analysis will be used to evaluate the diagnostic accuracy of the candidate biomarkers. Logistic regression will be used to assess the association between biomarker levels and lung cancer risk. Correlation analysis will be used to assess the association between biomarker levels and air pollution exposure. Mixed-effects models will be used to analyze longitudinal changes in biomarker levels. Survival analysis will be used to assess the association between longitudinal biomarker changes and lung cancer development.

    Statistical software packages such as R, Python (scikit-learn), and specialized bioinformatics tools will be used for data analysis.

  4. To identify vulnerable populations and assess their susceptibility to air pollution-related lung cancer. Study Design This will primarily utilize the existing longitudinal cohort data members. A cross-sectional study will be conducted among the family members/relatives of high-risk or lung cancer patients cohort. This will see the patients with a family history, and with chronic lung conditions. Mixed- Methods approach will be used to assess the effectiveness of the interventions. Study area The study will be conducted in the same diverse urban and peri-urban areas as the overall AIRCARE cohort study. The study area will be chosen to ensure the representation of various vulnerable populations, including children and adolescents, elderly individuals, individuals with pre-existing respiratory conditions (e.g., asthma, COPD), and individuals with low socioeconomic status. Sample size estimation and sampling strategy Sample size will be taken based on feasibility using the above-mentioned cohorts. The existing longitudinal cohort will be used, ensuring representation across different exposure and risk factor strata. Incident lung cancer cases within the cohort will be identified, and controls will be matched based on age, sex, residential area, and where possible, smoking history. A subset of the cohort will be selected for genotyping, focusing on individuals with extreme exposure levels and/or lung cancer cases and controls for genetic interaction studies. Project Implementation Plan Primary Outcome will be measured with differences in lung cancer incidence rates across vulnerable subgroups, and hazard ratios for lung cancer associated with air pollution exposure in each vulnerable subgroup. Secondary Outcomes will be measured with differences in lung cancer mortality rates across vulnerable subgroups, levels of biomarkers in vulnerable subgroups, and the impact of socioeconomic factors on the increased risk. Design of statistical analysis Cox proportional hazard models will be used to estimate hazard ratios for lung cancer in each vulnerable subgroup, with adjustment for potential confounders. Interaction terms will be included in regression models to assess the modifying effect of vulnerable population status on the association between air pollution and lung cancer. The stratified analysis will be used to assess the effect of air pollution within each vulnerable population. Chi-square tests or t-tests will be used to compare lung cancer incidence rates and other outcome measures across vulnerable subgroups. Analysis of variance (ANOVA) will be used to compare continuous outcome measures across multiple vulnerable subgroups. Logistic regression will be used to assess the association between vulnerable population status and specific health outcomes. Multiple linear regression will be used to analyze the effect of air pollution on lung function and biomarker levels in vulnerable subgroups. Statistical software packages such as R, SAS, and Python will be used for data analysis. Cost-effectiveness analysis will be used to assess the economic impact of the interventions.

Statistical software packages such as SPSS, R, and NVivo will be used for data analysis.

Campaign materials in terms of readability, cultural competence, and messaging will be evaluated.

Study Type

Observational

Enrollment (Estimated)

3230

Contacts and Locations

This section provides the contact details for those conducting the study, and information on where this study is being conducted.

Study Contact

Study Locations

    • National Capital Territory of Delhi
      • New Delhi, National Capital Territory of Delhi, India, 110029
        • All India Institute of Medical Sciences
        • Contact:

Participation Criteria

Researchers look for people who fit a certain description, called eligibility criteria. Some examples of these criteria are a person's general health condition or prior treatments.

Eligibility Criteria

Ages Eligible for Study

  • Adult
  • Older Adult

Accepts Healthy Volunteers

Yes

Sampling Method

Non-Probability Sample

Study Population

The study will enrol an estimated 3230 participants (1615 lung cancer cases and 1615 controls). Diagnosed lung cancer cases from the outpatient department and Delhi cancer registry (DCR) will be included.

Controls will be recruited from the family members of lung cancer patients to get the matched population in terms of PM 2.5 exposure.

Description

Inclusion Criteria:

  • Adults aged 18 years or older, Histologically confirmed diagnosis of lung cancer, Residing in Delhi/National Capital Region (NCR)

Exclusion Criteria:

  • Individuals aged below 18 years, individuals without a histologically confirmed diagnosis of lung cancer

Study Plan

This section provides details of the study plan, including how the study is designed and what the study is measuring.

How is the study designed?

Design Details

Cohorts and Interventions

Group / Cohort
Patients
Patients with Lung Cancer
Control
Age-, Sex- and Residence-matched individuals

What is the study measuring?

Primary Outcome Measures

Outcome Measure
Measure Description
Time Frame
PM 2.5 exposure
Time Frame: 3 years
The primary outcome will be measured in terms of histologically confirmed incidence of lung cancer. The interaction effects between air pollution (PM2.5) and other risk factors (smoking, Alcohol consumption, occupational exposures, genetic susceptibility) on lung cancer incidence and cumulative risk of lung cancer associated with combined exposure to air pollution and other risk factors will be evaluated.
3 years
Risk Stratification Model
Time Frame: 3 years
The primary outcome will be measured based on the performance of the risk stratification model in predicting lung cancer risk, assessed by area under the receiver operating characteristic (AUROC) curve, sensitivity and specificity at predefined risk thresholds, and using calibration plots, to assess the agreement between predicted and observed risks.
3 years
Biomarker Identification
Time Frame: 3 years
Primary Outcome will be measured by identification of biomarkers that are significantly associated with air pollution-related lung cancer, and the predictive ability of the biomarkers to discern between cases and controls.
3 years
Identifying Vulnerable Groups
Time Frame: 3 years
Primary Outcome will be measured with differences in lung cancer incidence rates across vulnerable subgroups, and hazard ratios for lung cancer associated with air pollution exposure in each vulnerable subgroup.
3 years

Secondary Outcome Measures

Outcome Measure
Measure Description
Time Frame
Exposure Interaction Effects
Time Frame: 3 years
Secondary Outcomes will be measured with the impact of interaction effects on lung cancer histological subtypes, modification of lung cancer risk by genetic variants in the presence of air pollution exposure, changes in biomarker profiles associated with combined exposures, and synergistic effects of PM 2.5 and other risk factors.
3 years
Modelling Epidemiology
Time Frame: 3 years
The secondary Outcomes will be measured in terms of the positive predictive value (PPV) and negative predictive value (NPV) of the model, the impact of the model on lung cancer detection rates and stage distribution, and the model's ability to correctly categorize risk in prespecified subgroups.
3 years
Variation in Biomarker Levels
Time Frame: 3 years
Secondary Outcomes will be measured in terms of the association between biomarker levels and air pollution exposure levels, temporal changes in biomarker levels associated with lung cancer development, association between biomarker levels and lung cancer histological subtypes, usefulness of biomarker panels for early lung cancer detection, and ability of the biomarkers to predict treatment response.
3 years
Characterisation of Vulnerable Groups
Time Frame: 3 years
Secondary Outcomes will be measured with differences in lung cancer mortality rates across vulnerable subgroups, levels of biomarkers in vulnerable subgroups, and the impact of socioeconomic factors on the increased risk.
3 years

Collaborators and Investigators

This is where you will find people and organizations involved with this study.

Collaborators

Study record dates

These dates track the progress of study record and summary results submissions to ClinicalTrials.gov. Study records and reported results are reviewed by the National Library of Medicine (NLM) to make sure they meet specific quality control standards before being posted on the public website.

Study Major Dates

Study Start (Estimated)

May 1, 2026

Primary Completion (Estimated)

February 1, 2028

Study Completion (Estimated)

August 1, 2028

Study Registration Dates

First Submitted

April 21, 2026

First Submitted That Met QC Criteria

April 21, 2026

First Posted (Actual)

April 28, 2026

Study Record Updates

Last Update Posted (Actual)

April 30, 2026

Last Update Submitted That Met QC Criteria

April 27, 2026

Last Verified

April 1, 2026

More Information

Terms related to this study

Plan for Individual participant data (IPD)

Plan to Share Individual Participant Data (IPD)?

NO

Drug and device information, study documents

Studies a U.S. FDA-regulated drug product

No

Studies a U.S. FDA-regulated device product

No

This information was retrieved directly from the website clinicaltrials.gov without any changes. If you have any requests to change, remove or update your study details, please contact register@clinicaltrials.gov. As soon as a change is implemented on clinicaltrials.gov, this will be updated automatically on our website as well.

Clinical Trials on Lung Cancer

Subscribe