Plasma Metabolomic Signatures of Chronic Obstructive Pulmonary Disease and the Impact of Genetic Variants on Phenotype-Driven Modules

Background: Small studies have recently suggested that there are specific plasma metabolic signatures in chronic obstructive pulmonary disease (COPD), but there have been no large comprehensive study of metabolomic signatures in COPD that also integrate genetic variants. Materials and Methods: Fresh frozen plasma from 957 non-Hispanic white subjects in COPDGene was used to quantify 995 metabolites with Metabolon's global metabolomics platform. Metabolite associations with five COPD phenotypes (chronic bronchitis, exacerbation frequency, percent emphysema, post-bronchodilator forced expiratory volume at one second [FEV1]/forced vital capacity [FVC], and FEV1 percent predicted) were assessed. A metabolome-wide association study was performed to find genetic associations with metabolite levels. Significantly associated single-nucleotide polymorphisms were tested for replication with independent metabolomic platforms and independent cohorts. COPD phenotype-driven modules were identified in network analysis integrated with genetic associations to assess gene-metabolite-phenotype interactions. Results: Of metabolites tested, 147 (14.8%) were significantly associated with at least 1 COPD phenotype. Associations with airflow obstruction were enriched for diacylglycerols and branched chain amino acids. Genetic associations were observed with 109 (11%) metabolites, 72 (66%) of which replicated in an independent cohort. For 20 metabolites, more than 20% of variance was explained by genetics. A sparse network of COPD phenotype-driven modules was identified, often containing metabolites missed in previous testing. Of the 26 COPD phenotype-driven modules, 6 contained metabolites with significant met-QTLs, although little module variance was explained by genetics. Conclusion: A dysregulation of systemic metabolism was predominantly found in COPD phenotypes characterized by airflow obstruction, where we identified robust heritable effects on individual metabolite abundances. However, network analysis, which increased the statistical power to detect associations missed previously in classic regression analyses, revealed that the genetic influence on COPD phenotype-driven metabolomic modules was modest when compared with clinical and environmental factors.


Introduction
Metabolites are low molecular weight ( £ 1500 Daltons) molecules, representing both endogenous and exogenous (environmentally derived) compounds, which play important roles in signaling, energy expenditure, reproduction, and growth. Metabolites vary greatly across individuals and can act as a unique identifier of an individual through time. 1 Metabolite expression is thought to be the most proximal signature of health and disease, when compared to other omics (e.g., genomics, transcriptomics, and proteomics). 2 Recently, there have been several reports suggesting the presence of characteristic metabolic signatures in the blood of individuals with lung diseases such as chronic obstructive pulmonary disease (COPD) [3][4][5][6][7][8] ; however, these reports have typically included only a small number of subjects or a limited annotation of metabolic features ( < 500 metabolites). During the past few years, substantial gains have been made in metabolomics, using analytical chemistry techniques and advanced computational methods to characterize complex biological mixtures. The highly sensitive detection techniques of liquid chromatography tandem mass spectrometry (LC-MS/MS) quantify metabolites from a broad range of classes and are now automated to increase throughput, enabling large cohort-level epidemiological metabolome studies. 9,10 These strategies have not yet been used on a large scale to study COPD.
COPD is characterized by progressive airflow limitation due to airway and/or alveolar abnormalities and is now the third most common cause of death worldwide, 11 and is one of the leading causes of medical hospitalizations in the United States. 12 Cigarette smoking is the greatest environmental risk factor, yet most smokers do not develop clinically important lung disease, and COPD heritability is estimated to be *37%. 13 Furthermore, for those who do develop COPD, there are heterogeneous phenotypes, including emphysema, chronic bronchitis, and frequent COPD exacerbations ( Supplementary Fig. S1). 14 While clinical variables such as age, race, sex, and environmental factors like smoking and body mass index (BMI) have been useful in modeling disease severity, a large amount of unexplained variability in COPD severity remains. 15 COPD is also associated with increased risk of non-pulmonary diseases independent of smoking history (e.g., cardiovascular disease, osteoporosis, depression, and cancer outside of the lung), suggesting the presence of systemic disturbances in metabolic pathways across comorbidities. 16 The availability of large longitudinal cohorts for smokers with or at high risk for COPD, such as COPDGene and SPIROMICS, combined with advances in high-throughput metabolomics now permit the large-scale interrogation of the metabolome in COPD.
A similar integrative approach to large-scale transcriptomics 17 and proteomics 18 studies in COPDGene, SPIROMICS, and other cohorts has revealed that a significant amount of variation in many biomarkers is explained by genetic variation, which may also impact metabolome signatures, as recently reported. 19 In addition, genome-wide association studies (GWASs) have found multiple genetic loci associated with COPD. 20 Thus, it is important to consider the role of genetic background in assessing how the metabolome relates to COPD; however, to our knowledge, comprehensive studies integrating genetics with comprehensive metabolomic profiling in COPD are lacking. This study identifies plasma metabolites associated with COPD-related phenotypes and addresses the impact of genetic variation on metabolomic profiles in COPD.

Study populations
Discovery. The NIH-sponsored multicenter Genetic Epidemiology of COPD (COPDGene; ClinicalTrials .gov Identifier: NCT01969344) study was approved and reviewed by the institutional review board at all participating centers. 21 All study participants provided written informed consent. This study enrolled 10,198 non-Hispanic white (NHW) and African American (AA) individuals from January 2008 until April 2011 (Phase 1), who were 45-80 years of age with ‡ 10 packyear smoking history and no exacerbation for > 30 days. In addition, 465 age-and gender-matched healthy individuals with no history of smoking were enrolled as controls (mostly at Phase 2). From July 2013 to July 2017, 5697 subjects returned for an in-person 5-year visit. Each in-person visit included spirometry before and after albuterol, quantitative computed tomography (CT) imaging of the chest, and blood sampling.
From two clinical centers (National Jewish Health and University of Iowa), 1136 subjects (1040 NHW, 96 AA) participated in an ancillary study in which they provided fresh frozen plasma collected using an 8.5 mL p100 tube (Becton Dickinson) at Phase 2. After excluding AA subjects due to small sample size and subjects lacking genotype data, to avoid confounding genetic associations due to ancestry, 957 subjects comprised the Discovery cohort ( Supplementary Fig. S2).
COPDGene: Emory. From the Discovery cohort, 271 COPDGene NHW subjects who previously had their metabolome quantified at Phase 2 on a separate platform were used as a technical COPDGene-Emory cohort. This cohort will be referred to as COPDGene-Emory.
SPIROMICS: Metabolon/UC. Two cohorts from the Subpopulations and Intermediate Outcome Measures in COPD study (SPIROMICS) (ClinicalTrials.gov Identifier: NCT01969344) were used for replication. 22 The SPIROMICS-Metabolon and SPIROMICS-UC subjects consisted of 445 and 76 NHW subjects, respectively, who provided fresh frozen plasma using a 10 mL EDTA tube (Becton Dickinson) before a research bronchoscopy. 23 Clinical data and definitions COPD was defined using spirometric evidence of airflow obstruction (post-bronchodilator forced expiratory volume at one second [FEV 1 ]/forced vital capacity [FVC] < 0.70). PRISm subjects had an FEV 1 percent predicted (FEV 1 pp) < 80% with an FEV 1 /FVC ‡ 0.7. PRISm subjects have recently been recognized as having a higher prevalence of symptoms and worse outcomes compared to traditionally defined controls, 24 and were thus included in all cohorts. Chronic bronchitis was defined as self-reported chronic cough and sputum for at least 3 months in each of the 2 years before Phase 2. Percent emphysema was quantified by percent of lung voxels less than À950 Hounsfield Units (% low attenuation areas) on the inspiratory CT scans. Visual emphysema was assessed as previously described. 25 Exacerbations were defined as acute worsening of respiratory symptoms requiring treatment with oral corticosteroids and/or antibiotics, emergency room visit, or hospital admission. 26 Metabolite profiling Discovery platform. P100 plasma was profiled using the Metabolon (Durham) global metabolomics platform, as described. [27][28][29] Briefly, samples were extracted with methanol under vigorous shaking for 2 min (Glen Mills GenoGrinder 2000) followed by centrifugation to remove protein, dissociate small molecules bound to protein or trapped in the precipitated protein matrix, and recover chemically diverse metabolites. The resulting extract was divided into five fractions: two for analysis by two separate reverse-phase/ultrahighperformance liquid chromatography/tandem mass spectrometry (RP/UPLC-MS/MS) methods with positive ion mode electrospray ionization (ESI), one for analysis by RP/UPLC-MS/MS with negative ion mode ESI, one for analysis by hydrophilic interaction liquid chromatography (HILIC)/UPLC-MS/MS with negative ion mode ESI, and one was reserved for backup.
Metabolon has developed peak detection and integration software to generate a list of (mass-to-charge) m/z ratios, retention indices (RI), and area under the curve values for each detected metabolite, as described in detail. [27][28][29] User-specified criteria for peak detection included thresholds for signal to noise ratio, area, and width. Relative standard deviations of peak area were determined for internal and recovery standards to confirm extraction efficiency, instrument performance, column integrity, chromatography, and mass calibration.
The biological data sets, including quality control samples, were chromatographically aligned based on a retention index that utilized internal standards assigned a fixed RI value. The RI of the experimental peak was determined by assuming a linear fit between flanking RI markers whose RI values are set. Peaks were matched against an in-house library of authentic standards and routinely detected unknown compounds specific to the respective method. Identifications were based on retention index values, experimental precursor mass match to the library authentic standard within 10 ppm, and quality of MS/MS match. All proposed identifications were then manually reviewed and curated by an analyst who approved or rejected each identification based on the criteria above . The platform  reported 1392 features, including 1064 annotated features, which were grouped by Metabolon into ''super  pathways,'' including 436 lipids, 261 xenobiotics, 207  amino acids, 40 peptides, 38 cofactors and enzymes,  35 nucleotides, 25 carbohydrates, 11 energy pathway  compounds, and 11 partially characterized molecules  (Supplementary Table S1). All compounds are further annotated by ''subpathway'' (e.g., ''sphingomyelins,'' ''carnitine metabolism,'' and ''lysine metabolism'').
COPDGene: Emory. Compounds from p100 fresh frozen plasma were extracted using an untargeted LC-MS-based metabolomic quantification protocol from the laboratory of Dean Jones at Emory University as described previously. 30 In brief, eight stable isotope internal standards in 130 lL acetonitrile were mixed with 65 lL of plasma. Samples were precipitated and chromatographic separation of the supernatant was performed on a Dionex Ultimate 3000 UHPLC with a dual column compartment for column switching. Reverse phase (C18), anion exchange (AE), and HILIC preceded mass spectral detection using a Thermo Scientific Q-Exactive HF mass spectrometer in continuous full scan mode at 70,000 resolution (scan range 85-1275 m/z for all analyses other than AE, AE scan range was 100-1500 m/z).
Data were extracted using xMSanalyzer 31 and annotated using xMSannotator. 32  SPIROMICS: UC. Samples from p100 fresh frozen plasma underwent LC-MS profiling in the laboratory of Nichole Reisdorph at the University of Colorado Anschutz Medical Campus as previously described. 33,34 In brief, cold methanol was added to plasma sample aliquots containing internal standards to precipitate proteins. Supernatants were extracted using liquidliquid extraction with methyl tert-butyl ether to obtain a lipid fraction and a small molecule aqueous fraction. Samples were analyzed in positive mode using C18 and HILIC on an Agilent 6545 quadrupole time-of-flight (QTOF) and 6520 QTOF, respectively. Spectral peaks were extracted using MassHunter Profinder B.08 (Agilent). Features were annotated using Mass Profiler Professional (Agilent) using either an in-house accurate mass and retention time (AMRT) database or exact mass and isotope ratios for the compounds that were not in the AMRT database. There were 10,561 features detected among the 81 samples.
Genotyping Discovery and COPDGene: Emory. Subjects were genotyped using the HumanOmniExpress array (Illumina) employing BeadStudio quality control, which included reclustering on project samples following Illumina guidelines, as previously described for COPDGene. Genotype imputation was performed using the Michigan Imputation Server and the HRC 1.1 reference NHW and the 1000 Genome Phase 1 v3 for AAs. 35 Ancestry-based principal components (PCs) were calculated and used as previously described. 36,37 Variants were filtered to include only single-nucleotide polymorphisms (SNPs) with minor allele frequencies > 1% in the sample population.
SPIROMICS. Subjects were genotyped using the HumanOmniExpress array (Illumina) as previously described. 38 Around 683,998 directly genotyped SNPs passed quality control after the removal of SNPs significantly deviating from Hardy-Weinberg expectations ( p < 0.0001), missing allele data (any ''0''), and with a genotype call rate < 90% and heterozygous haploid genotypes. Genotype imputation was performed using the Michigan imputation server and the HRC 1.1 reference NHW ancestry-based PCs were calculated and used as previously described. 36 Variants were filtered to include only SNPs with minor allele frequencies > 1% in the sample population.

Statistical analysis
Data sets and availability. Clinical data and genotype data can be found on dbGaP for COPDGene (phs000179.v6.p2) and SPIROMICS (phs001119.v1.p1). For COPDGene, the following dataset was used: COPDGene_P1P2_All_Visit_29Sep2018. For SPIRO-MICS, the CORE 5 data sets were used. Discovery metabolomic data are available at the NIH Common Fund's National Metabolomics Data Repository website, the Metabolomics Workbench, https://www.metab olomicsworkbench.org where it has been assigned project ID PR000907.

Pre-analyses
Discovery and SPIROMICS: Metabolon. Unless otherwise mentioned, all metabolite data processing and analysis were performed in R (v3.5.1). A data normalization step was performed to correct variation resulting from instrument interday tuning differences: metabolite intensities were divided by the metabolite run day median and then multiplied by the overall metabolite median. It was determined that no further normalization was necessary based on the reduction in the significance of association between the top metabolomics PCs (calculated using the R function ''prcomp'') and sample run day after normalization (Supplementary Fig. S3). Metabolites were excluded if > 20% of samples were missing values. 39 For the 995 remaining metabolites, missing values were imputed across metabolites with k-nearest neighbor imputation (k = 10) using the R package ''impute.'' 40 To detect and remove outliers, median standard deviation scores (z-scores) were calculated across metabolites at the subject level. Subjects with aggregate metabolite median z-scores > 3.5 standard deviation from the mean (N = 6) of the cohort were removed ( Supplementary  Fig. S4). All measured metabolite relative abundances were transformed using the normal quantile transformation, as this type of rank-based transformation can remove possible bias due to outliers or skewed distribution. 41 COPDGene: Emory. Metabolite data were preprocessed using the MSPrep R package. 42 Data were first imported and summarized across three technical replicates before filtering to include only compounds with < 20% missingness over samples. This reduced the data to 2891 compounds, 163 of which were annotated with compound name. Bayesian principal component analysis was employed for imputation 43,44 of missing values before ComBat batch correction and quantile normal transformation. 45 SPIROMICS. Metabolite data were pre-processed using the MSPrep R package 42 as described previously. 33 Raw data were filtered to include only compounds with < 20% missingness over samples. This reduced the data to 7918 compounds, 3843 of which were annotated by compound name. k-Nearest neighbor imputation (k = 5) was employed for imputation of missing values before ComBat batch correction and quantile normal transformation 45 Exploring associations between COPD and metabolites in Discovery cohort. Phenotype-metabolite associa-tions were tested using various regression models and covariates based on previous literature (Supplementary  Table S2) 46 for five phenotypes. Significance was determined within each phenotype at a p-value < 5.03 · 10 À5 after employing a Bonferroni correction to account for multiple testing over 995 metabolites.
Metabolome-wide association study. First, the additive effects of SNPs on metabolite abundances were assessed in the Discovery cohort with linear regression using the R package ''MatrixEQTL'' (version 2.2). 47 Models were adjusted for clinical covariates (clinical center, sex, age, BMI, smoking pack years, and current smoking status) as well as ancestry-based PCs and as previously described. 36 Metabolite quantitative trait loci (met-QTLs) were considered significant at p-value < 6.6 · 10 À12 for genome-wide significance after employing a Bonferroni correction to account for multiple testing across 995 metabolites and 7,641,295 genotyped and imputed SNPs.
Metabolome-wide association study replication across metabolomic platforms and cohorts. Significant met-QTL SNPs were tested for associations in the COPDGene-Emory and SPIROMICS replication platforms, using the same methods as previously described for the Discovery cohort and the Bonferroni correction for multiple testing.
Recursive conditioning. If K met-QTL-SNPs were associated with a metabolite with p-values smaller than 6.6 · 10 À12 , p-values were calculated for each of the KÀ1 SNPs conditioning on the top SNP identified in the met-QTL analysis and other covariates (age, sex, BMI, smoking status, smoking pack-years, and clinical center). The SNP with the smallest p-value was considered an independent met-QTL if p-value < 0.05/(KÀ1), where 0.05/(KÀ1) was the p-value threshold by Bonferroni correction. We applied this procedure iteratively until the smallest p-value was larger than 0.05/T, where T is the number of remaining SNPs. 36 Exploring met-QTLs. Percent variance explained by SNPs and clinical variables was calculated using the coefficient of determination (r 2 ). met-QTL features were characterized using the ''-most_severe_variant'' filter and nearest genes were identified using the ''-nearest symbol'' argument in the Ensembl Variant Effect Predictor (VEP) tool (V97). 48 Enrichment analysis. Group enrichment (i.e., subpathway for metabolites or variant class for SNPs) among significantly associated features was statistically assessed against the entire feature set using a one-tailed Fisher's exact test. 49 Results were adjusted using Benjamini and Hochberg 50 (a.k.a. false discovery rate) with an alpha of 0.05.
Network analysis of metabolic interaction. As metabolic pathway annotations are arbitrarily defined and ignore unannotated compounds, 51,52 we sought to identify COPD-affected pathways in a strictly data-driven manner. This was performed in a two-step procedure. First, we generated a Gaussian graphical model (GGM) of metabolite co-abundance based on partial correlation coefficients corrected for the effects of all other metabolites and potential confounders (age, sex, BMI, smoking status, smoking pack-years, and clinical center). 53 The use of partial coefficients in the GGM model seeks to overcome a major drawback of other correlation networks (e.g., Pearson's) by conditioning against correlations with all other variables. Edges between metabolites were present if partial correlations were statistically significant at an alpha of 0.05, after Bonferroni correcting for 995 2 tests, with a positive partial correlation > 0.2 to declare whether an edge is ''present'' in the network view. Negative partial correlations likely represent spurious signals as detailed in previous publications, 53,54 and thus were removed. To infer potential genetic effects, results from the metabolome-wide association study (mWAS) were included in the network view by introducing ''SNP'' nodes with edges present between met-QTLs and associated metabolites. 54 In summary, the combined GGM and mWAS approach will provide an unbiased map of metabolic pathways and their genetic influences. 53 The first step based on the GGM identifies partially correlated metabolites, but does not consider phenotypes. Therefore, in the second step, metabolomic modules associated with COPD phenotypes were identified using a greedy search algorithm. 55 Each phenotype was tested separately. Briefly, each metabolite node was regressed against the phenotype and scored using the negative logarithmized p-value of the phenotype beta coefficient. Phenotypes were adjusted for the same covariates as identified in previous literature (Supplementary  Table S2) 46 by regressing the phenotype against those covariates and using the residuals as the independent variable in the model. Next, starting with a seed node, each neighboring node is added iteratively to the candi-date module by averaging metabolite intensities, and this extended module is scored by linear regression as previously described. The neighbor is added only if the score of the newly extended module is higher than the scores of all the single components. Any overlapping optimal module is combined in a final step into a single module and scored by the scoring function, using the same rules as before to determine inclusion.
In summary, this approach systematically identifies phenotype-affected modules based on a GGM-derived network of metabolic pathways. Both steps were performed using the R package ''MoDentify'' 55 and visualized using Cytoscape (v3.71). 56

Results
Metabolome data substructure Before reducing data by the exclusion criteria, we first explored the metabolomic profiles of all COPDGene subjects with metabolomes quantified by Metabolon at Phase 2. These subjects were representative of all COPDGene subjects who returned for the 5-year follow-up (Supplementary Table S3). Pairwise correlations among metabolites were assessed using Pearson's r for hierarchical clustering within Metabolon-defined super pathways. Beyond a positively correlated cluster of lipids, metabolites exhibited minor correlation ( Supplementary Fig. S5).
Univariate demographic associations were then assessed with linear regression models. The demographic variables most strongly associated with metabolites included age, sex, race, and smoking status (Supplementary Table S4 and Supplementary Fig. S5). Of the 995 metabolites tested, 398 (40.2%) were significantly associated with age, 319 (32.1%) with sex, 355 (37.2%) with race, 250 with BMI (25.1%), and 128 (12.9%) with smoking status. Enrichment analysis found androgenic steroids, acylcarnitines, and dicarboxylates to be enriched for associations with age; sphingomyelin, androgenic steroids, and phosphatidylcholines with sex; xanthines and dicarboxylates with race; and diacylglycerols and branched chain amino acids (BCAAs) for BMI (Supplementary Table S5).

Study subjects
Demographic and clinical characteristics of the Discovery cohort are shown in Table 1. There were significant differences between PRISm subjects, current or former smoker controls, and COPD across age, sex, BMI, smoking status, and smoking pack-years. Among the met-QTL replication cohorts, COPDGene-Emory subjects were representative of the Discovery cohort, while the SPIROMICS subjects were slightly younger and healthier, as evidenced by the higher FEV 1 pp and lower percent emphysema ( Table 2).
Metabolites associated with COPD phenotypes in the Discovery cohort Of the 995 metabolites tested for associations, 147 (14.8%) were significantly associated with at least 1 of the 5 COPD phenotypes studied (Fig. 1A, full results in Supplementary Tables S6-S10). There was no metabolite significantly associated with chronic bronchitis. For exacerbations and emphysema, only one metabolite was identified in each. Higher abundance of N,N,N-trimethyl-alanylproline betaine (TMAP) was significantly associated with a decrease in exacerbation frequency ( p = 3.75 · 10 À5 ), while increased abundance in a tricarboxylic cycle metabolite (citrate) was significantly associated with higher percent emphysema ( p = 5.2 · 10 À6 ).

Identification of SNPs associated with metabolites
We next investigated the genetic contribution to metabolite abundances by investigating the relationship between Chi-square tests were used to test for differences between groups in binary variables. One-way ANOVA tests were performed to test for differences between groups in continuous variables.
PRISm, Preserved Ratio Impaired Spirometry 23 ; COPD is defined by GOLD score ‡ 1; missing, 14 subjects were deemed ineligible for spirometry and thus did not have a defined GOLD status. These subjects were still included in analyses with other COPD phenotypes and the met-QTL analysis.
BMI, body mass index (kg/m 2 ); FEV 1 /FVC, post-bronchodilator forced expiratory volume at one second/forced vital capacity; FEV 1 pp, FEV 1 percent predicted. a Mean and standard deviations provided. COPD, chronic obstructive pulmonary disease. Chi-square tests were used to test for differences between groups in binary variables. One-way ANOVA tests were performed to test for differences between groups in continuous variables. genotypes and metabolites. Of the *7.6 million genotyped and imputed SNPs tested, we identified 4281 met-QTL SNPs associated with 109 (10.95%) of metabolites tested in the Discovery cohort ( Fig. 2A and Supplementary  Table S12). An interactive plot displaying met-QTL SNP association with metabolite subclass can be found at https://plot.ly/*lagillenwater/7 Using recursive conditioning, 79 independent SNPs were identified with an additive relationship with the 109 metabolites (Supplementary Table S13 and Fig. 2C, D). At least 15% of the variance in 20 metabolites was explained by 1 or more of these SNPs, often much more than observed in clinical variables (Table 3 and Fig. 2E).
The strongest genetic link was between a missense variant in the PYROXD2 region of chromosome 10, rs2147896, which explained 50.48% of the variance of N6-methyllysine; in contrast, the clinical variables explained only 0.64% of the variance in this metabolite (Fig. 2E). For 13 metabolites, 2 or more independent met-QTL SNPs contribute to metabolite variance (Table 3). For example, 58.90% of variance in N2acteyl, N6-methyllysine is explained by variants in PYROXD2 (34.26%) and NAT8 (24.64%) regions.
Biologic significance of met-QTL SNPs and associated metabolites Next, we set out to determine if these met-QTL SNPs have been previously associated with COPD, lung function, or metabolite levels. First, we cross-referenced the 4281 variants with 279 significant SNPs from a recent lung function GWAS 57 and164 reported primary and secondary COPD GWAS SNPs 58 for overlapping associations. Next, we compared the met-QTL variants with published associations in the NHGRI GWAS catalog. 59 Of the expanded variant set, 351 SNPs have been previously reported, mostly in other metabolomic analyses. 54,[60][61][62][63] There were seven met-QTL SNPs that had previously been associated with smoking habits (SNPs rs10254729, rs10469966, rs12825376, rs13437771, rs2072113, rs2421667, and rs883403), although no met-QTL SNP overlapped with lung function or COPD GWAS SNPs.
Using Ensembl VEP, we found intronic SNPs to be the most represented met-QTL SNP class (64.5%), followed by intergenic variants (12.6%) (Supplementary Table S14 and Fig. 3A). Intronic variants were also the most significantly enriched, followed by 3¢ untranslated region and missense variants (q = 3.48 · 10 À79 , 3.99 · 10 À33 , and 2.68 · 10 À25 ). At least 50% of metabolites in 13 subpathways had met-QTLs, with all 3 of the metabolites in the hemoglobin and porphyrin metabolism subpathway having significant genetic associations (Supplementary Table S15 and Fig. 3B). Although metabolites with met-QTLs were not enriched for any subpathway, at the super-pathway level, an enrichment of amino acids was found (q = 0.048).

Replication of met-QTL SNPs
We used three strategies to test for replication of met-QTL SNPs (see Materials and Methods section). First, we used an independent high-resolution LC-MS strategy in a different laboratory in the same cohort (COPDGene-Emory) ( Supplementary Table S16). Second, we used an independent cohort with data also quantified by Metabolon (SPIROMICS-Metabolon) (Supplementary Table S17). Third, we used an independent cohort with data from an independent platform (SPIROMICS-UC) (Supplementary Table S18). The cohort with the greatest replication was SPIROMICS-Metabolon where 72 met-QTL associations replicated (Fig. 4, Table 4, and Supplementary        COPDGene-Emory metabolites not annotated by xMSAnnotator are reported as mass-to-charge ratio and retention time.
*Indicates compounds that have not been officially confirmed based on a standard, but Metabolon is confident in its identity.
(#) or [#] indicates a compound that is a structural isomer of another compound in the Metabolon spectral library.  (Table 3 and Supplementary Fig. S5).

Integration of genes, metabolites, and COPD phenotypes
The met-QTL analysis provides evidence of geneticmetabolite abundance links, but not specific to COPD.
To identify affected genetic-metabolite-phenotype pathways in a strictly data-driven manner, we first created a GGM network of co-abundant metabolites and then used those results to identify modules associated with disease phenotypes. This method has been shown to enhance classical association analyses by increasing statistical power through aggregating metabolite abundance and recognizing disease-driven interplay between pathways. 55 To infer the relationship between the genomics, metabolomics, and phenotypic data, the mWAS results were combined with phenotype-driven modules by adding edges between met-QTL SNPs and metabolites.
In the first step, given all 995 metabolites, a GGM was created. Then, nodes representing independent met-QTL SNPs were added, with edges linking to associated metabolites. The final GGM network was sparse, containing a combined 693 nodes (582 metabolite and 79 SNP nodes) and 505 significant, undirected edges between any node (metabolites or genes).
Then, in the second step, we used the MoDentfy module-identification algorithm with the COPD phenotypes to find phenotype-associated modules (i.e., subnetworks of the GGM associated with a specific phenotype). Testing all 5 COPD phenotypes separately, this resulted in 26 significant modules, sometimes associated with more than 1 phenotype, which included metabolites missed in univariate analysis ( Fig. 5 and Supplementary Table S19). For example, a module of three lactosylceramides was associated with percent emphysema (adjusted p-value = 0.00018) and a module of three hippurates was associated with FEV 1 /FVC (adjusted p-value = 0.04), all of which had not been significantly associated previously (although the same direction of effect was observed between the modules and independent metabolites; see Supplementary Table S19). Other modules reconfirmed previously identified associations, like the module most associated with FEV 1 pp (Bonferroni adjusted p-value = 3 · 10 À7 ) containing the amino acid vanillylmandelate and two unannotated metabolites (X -12707 and X -13553).
Of the 26 modules associated with COPD phenotypes, 6 associated with airflow obstruction phenotypes had edges to gene nodes (Table 5). For these modules, we utilized the percent-variance-explained results to determine the genetic effect on individual metabolite abundances and the module, represented by the first PC of the module. We further compared the variance explained by genetic variance to the percent explained by clinical and environmental variance, as represented by the covariates used previously. While, as reported earlier, significant variance in individual metabolites was explained by genetic variance (ranging from 5% to 18.9%), the opposite effect was observed in COPD phenotype-associated modules, with the exception of the module of dicarboxylate fatty acids containing decadienedioic acid (C10:2-DC)** and tetradecadienedioate (C14:2-DC)* where 13.2% of the module variance was explained by variation in SNPs rs11626972 and rs58231493 and only 3% was explained by clinical and environmental variance.

Discussion
While COPD is a disease of the lungs, we find a strong systemic metabolomic signature in the blood even after adjusting for common risk factors such as smoking. This is consistent with observations that COPD is associated with extrapulmonary diseases such as cardiovascular disease, osteoporosis, muscle wasting, and insulin resistance. Many of the metabolomic signatures we identified are similar to those found in these diseases (e.g., sphingolipids in cardiovascular and metabolic disorders 64 or bone remodeling, 65 acylcarnitines in osteoporosis, 66 and diacylglycerols in insulin resistance 67 ), suggesting common systemic pathways are important in COPD pathogenesis.
The lone metabolite associated with exacerbation frequency, of TMAP, further exemplifies the potential systemic effects of COPD. Although the reported association is novel in COPD, TMAP was recently identified as a biomarker of chronic kidney disease, 68 a comorbidity of COPD, 69 with a similar inverse association between Gillenwater, et al.; Network and Systems Medicine 2020, 3.1 http://online.liebertpub.com/doi/10.1089/nsm.2020.0009 TMAP abundances and disease severity. Moreover, while the biologic origin of TMAP has not yet been identified, it is suggested that myosin light-chain (MLC) protein degradation results in the release of TMAP. 68 Disruption in MLC isoforms has been observed in COPD subjects with reduced activity and low oxygen supply, yet further work is needed to understand the pathophysiology of TMAP in exacerbations.
Systemic mitochondrial dysfunction, heightened in lungs with cigarette smoke-induced inflammatoryoxidative stress, has been implicated in the pathology of emphysema. 7,70 We found further support of this   *Indicates compounds that have not been officially confirmed based on a standard, but Metabolon is confident in its identity; **indicates a compound for which a standard is not available, but Metabolon is confident in its identity or the information provided; (#) or [#] indicates a compound that is a structural isomer of another compound in the Metabolon spectral library.
PC, principal component.
as citrate was uniquely associated with the percent emphysema phenotype in regression analyses, demonstrating potential TCA cycle dysregulation. The increased power of phenotype-driven GGM network analysis positively associated three lactosylceramides with percent emphysema. Abnormalities in glycosphingolipid metabolism have been noted to be associated with COPD phenotypes. For example, there is evidence for correlation between glycobiosyl ceramides and COPD exacerbations, 6 and that glucosyl ceramide synthase, which governs the first step in the glycosphingolipid metabolism, is an important determinant of cell fate of lung endothelial cells. 71 Moreover, lactosylceramide accumulation was recently identified as a common pathogenic mechanism that induces apoptotic-inflammatory responses and aberrantautophagy leading to emphysema. 72 Lactosylceramides can directly inhibit electron chain complexes, which enhance the production of reactive oxidation species in the mitochondria, potentially leading to lung inflammation and airway remodeling characteristic of emphysema. 72,73 As the initial products in the formation of glycosphingolipids (e.g., lactosylceramides) are upregulated in insulin-resistant patients, increased lactosylceramide abundance may demonstrate comorbid mitochondrial dysregulation in COPD and metabolic disorders.
Several other metabolites previously associated with insulin resistance and other metabolic disorders were concordantly associated with COPD phenotypes. These included aromatic amino acids (phenylalanines) and BCAAs with both percent emphysema and airflow obstruction, as well as diacylglycerols, gamma-glutamyl amino acids, sphingomyelins, and lipids involved in the fatty acid and phospholipid metabolism, specifically with airflow obstruction. Abnormal amino acid and lipid metabolism may result from reduced dietary intake, oxidative stress, and increased strain on respiratory muscles with anoxia, leading to an active metabolic COPD state in COPD patients. 74,75 These results confirm the findings of smaller studies that have shown strong associations between phospholipid-derived sphingolipids and COPD, 6,76 and a recent two-cohort population study (KORA and ARIC) with 4347 controls and 393 COPD subjects that identified similar associations with BCAAs, aromatic amino acids, and glutamine/glutamate metabolites. 8 However, our study differed from the KORA and ARIC study, in that we found more associations with FEV 1 /FVC than FEV 1 pp, indicating a metabolic signa-ture of airflow obstruction. This may be because COPDGene primarily enrolled current and former smokers ( > 10 and > 20 pack-years, respectively), oversampled for COPD cases, was older, and included only NHW subjects (ARIC had many AA subjects). Although we adjusted for these variables in our analyses, age and smoking have strong influences on metabolome, and thus the generalizability might be limited.
While lifestyle behaviors (e.g., smoking) are important risk factors for COPD, there is evidence that as much as 37% of the variability in lung function is genetic. 13 To explore this, we first sought to identify the genetic effect on the metabolome, detecting significant SNP-metabolite associations in 109 (11%) of the metabolites tested (similar to the 119 of 529 [22%] metabolites previously reported by Shin et al). 54 The strongest association was between missense SNP rsrs2147896 in PYROXD2 and N6methyllysine ( p = 3.97 · 10 À146 ). PYROXD2 has been associated with lysine metabolites in other mWAS, including N6-methyllysine, as well as trimethylamine in urine and dimethylamine in plasma. 77 Of the 79 independent loci identified with recursive conditioning, 47 novel SNPs were found, including rs58231493, an upstream variant of ACOT2 (coding for Acyl-CoA Thioesterase 2) associated with decadienedioic acid (C10:2-DC)**.
One of the most promising met-QTLs, as it was significant across replication cohorts, was in UGT1A region and associated with bilirubin pathway metabolites. In the Framingham Heart Study Offspring cohort, those with higher bilirubin due to a genetic polymorphism affecting the UGT1A1 enzyme of bilirubin metabolism (the enzyme defect that leads to Gilbert's syndrome) had one-third the risk of cardiovascular events compared to wild-type carriers with normal bilirubin concentrations. 78 Higher levels of serum bilirubin have been inversely associated with the risk of COPD severity, progression, and mortality, 79,80 and more recently, fewer COPD exacerbations. 81 In in vitro and animal studies, bilirubin prevents oxidation of lipids, which may protect the COPD lung by inhibiting lipid peroxidation. 80,82 Variability in bilirubin concentration has been previously associated with SNPs in UGT1A region. 83,84 In our analysis, we found both bilirubin and biliverdin nearing the conservative Bonferroni significance threshold for associations with airflow obstruction phenotypes ( p = 3.02 · 10 À3 and 6.83 · 10 À4 with FEV 1 /FVC, respectively). Integrating the genetic associations suggests that the observed associations between bilirubin/biliverdin and COPD may be mediated through genetics. While there appears to be a strong genetic effect on the metabolome overall, there is less evidence for the genetic regulation of COPD-associated metabolites. Only 10 of the metabolites associated with COPD phenotypes through regression or phenotype-driven network analysis also had met-QTL SNP associations. Moreover, it was only within the module containing the fatty acids decadienedioic acid (C10:2-DC)** and tetradecadienedioate (14:2-DC)* that more variance was explained by an upstream variant of ACOT2 than environmental variables.
The protein that ACOT2 codes for, Acyl-CoA thioesterase-2, has been shown to facilitate mitochondrial fatty acid oxidation in mouse models and may warrant further study. 85 The lack of evidence for genetic regulation in the COPD metabolome may implicate other downstream regulation (e.g., methylation, post-translational modification, and metabolism of exogenous metabolites) having a greater effect. Thus, modifiable behaviors, like smoking, diet, and exercise, may have a greater effect on the COPD metabolism than genetic predisposition.
While this study was strengthened by the large number of subjects in a well-categorized cohort, there were several limitations. First, this analysis was performed using blood samples, as opposed to bronchial lavage fluid, which may better represent COPD phenotypes. 33 It is well documented that the blood metabolome, across multiple metabolic pathways, is strongly affected by demographic factors, including age and sex, 51,86 which we replicated in our initial exploratory analyses. COPD pathology begins in the lungs and then manifests as systemic dysregulation across several biologic tissues. However, the observed effects on the metabolome may still not be as pronounced within blood as the effects of age and sex.
Second, although this is one of the largest mWAS studies reported, 957 subjects are still small compared to clinical GWAS studies. The cohort was also restricted to NHW subjects, which limits generalizability over the entire population. Moreover, as many distinct and independent met-QTL SNPs were identified for many metabolites, there may be multiple mechanisms along genetic and metabolic pathways that influence observed metabolite intensities.
Another major challenge in metabolomics is crossplatform replication. Although we had two independent cohort metabolomics platforms available for replication and we identified similar met-QTLs across cohorts, the named metabolome features for these met-QTL metabolites used three different annotation techniques (Metabolon was proprietary annotation; COPDGene-Emory used xMSannotator; and CU used Agilent MassHunter and IDBrowser). These annotation strategies are optimized to the platforms and cross-annotation was challenging.
Despite these challenges, we were able to use the presence of common met-QTLs as evidence to support a specific annotation; however, since all the platforms were untargeted, it was sometimes unclear which of the three annotations was the correct annotation. Finally, the sample sizes our COPDGene-Emory and replication cohorts, as well as their limited metabolite annotation, greatly limited our statistical power to detect replicating met-QTLs. Further work with targeted metabolomics studies could assist with these met-QTL associations.
In conclusion, this study found evidence in the blood metabolome for systemic dysregulation of metabolic pathways affecting COPD phenotypes in a diseased population. By further assessing the blood metabolome for genetic regulation, we reproduced several known associations and identified many novel met-QTL SNPs. Furthermore, we expanded and contextualized metabolite associations through COPD phenotypedriven module identification, integrating the genetic associations into the network view. While we found nongenetic factors to explain more variance in COPDassociated metabolites than genetic, further work is needed, potentially integrating the metabolome with other omics data types (e.g., epigenomics and proteomics), to elucidate and characterize dysregulated pathways in COPD pathogenesis.

Acknowledgments
COPDGene phase 3 Grant support and disclaimer. The project described was supported by award number U01 HL089897 and award number U01 HL089856 from the National Heart, Lung, and Blood Institute. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Heart, Lung, and Blood Institute or the National Institutes of Health.
COPD foundation funding. The COPDGene Ò project is also supported by the COPD Foundation through contributions made to an Industry Advisory Board comprising AstraZeneca, Boehringer-Ingelheim, Genentech, GlaxoSmithKline, Novartis, and Sunovion.   Table S1  Supplementary Table S2  Supplementary Table S3  Supplementary Table S4  Supplementary Table S5  Supplementary Table S6  Supplementary Table S7  Supplementary Table S8  Supplementary Table S9  Supplementary Table S10  Supplementary Table S11  Supplementary Table S12  Supplementary Table S13  Supplementary Table S14  Supplementary Table S15  Supplementary Table S16  Supplementary Table S17  Supplementary Table S18  Supplementary Table S19