Multiple Sclerosis Atlas: A Molecular Map of Brain Lesion Stages in Progressive Multiple Sclerosis

Introduction: Multiple sclerosis (MS) is a chronic disorder of the central nervous system with an untreatable late progressive phase. Molecular maps of different stages of brain lesion evolution in patients with progressive multiple sclerosis (PMS) are missing but critical for understanding disease development and to identify novel targets to halt progression. Materials and Methods: The MS Atlas database comprises comprehensive high-quality transcriptomic profiles of 98 white matter (WM) brain samples of different lesion types (normal-appearing WM [NAWM], active, chronic active, inactive, remyelinating) from ten progressive MS patients and 25 WM areas from five non-neurological diseased cases. Results: We introduce the first MS brain lesion atlas (msatlas.dk), developed to address the current challenges of understanding mechanisms driving the fate on a lesion basis. The MS Atlas gives means for testing research hypotheses, validating biomarkers and drug targets. It comes with a user-friendly web interface, and it fosters bioinformatic methods for de novo network enrichment to extract mechanistic markers for specific lesion types and pathway-based lesion type comparison. We describe examples of how the MS Atlas can be used to extract systems medicine signatures and demonstrate the interface of MS Atlas. Conclusion: This compendium of mechanistic PMS WM lesion profiles is an invaluable resource to fuel future MS research and a new basis for treatment development.


Introduction
Multiple sclerosis (MS) is a chronic inflammatory, demyelinating, and neurodegenerative disorder of the central nervous system (CNS). 1 It is one of the most common causes of neurological disability in young adults [2][3][4] and the incidence is increasing. 5,6 In about 50% of patients with relapsing multiple sclerosis (RMS), the disease evolves into a progressive phase. At this stage, progression is relentless, and treatments become ineffective. Lesions in the white matter (WM) characterize MS from the early phase. As the disease progresses, quantitative and qualitative changes in the WM can be observed. However, key aspects of progressive multiple sclerosis (PMS) pathogenesis are still unsolved, making it challenging to develop treatments. One reason is that direct studies on brain lesions of PMS patients are sparse and the evolution of acute lesion in the MS brain and their fate are not well characterized, mainly due to the limited access of brain tissue. Brain samples from MS patients can be obtained either by needle biopsy or by autopsy. However, needle biopsy is done in MS only if there is doubt about differentiation from other diseases, so usually biopsy samples do not represent typical MS. Needle biopsy does not allow either the examination of different lesion types, as only minuscule samples from the atypical brain lesion are obtained, and examination of the whole brain for different lesion types is not possible. In addition, knowledge from human MS brain lesions is mostly based on candidate gene approaches such as immunohistochemistry, microarrays, and quantitative polymerase chain reaction. In the past three decades, biobanking of fresh-frozen tissues and advanced technologies in transcriptomics and genomics contributed to more comprehensive studies on the brain material. Unfortunately, most of the generated gene lists do not overlap, which may be due to the use of targeted amplicon sequencing and microarrays, and lack of correction for multiple testing. [7][8][9][10][11][12][13][14][15][16][17][18][19] To allow for identifying systems biology expression signatures that describe brain lesion type formation, evolution, and progression, we have assembled the first interactive MS lesion expression map (MS Atlas). At its core sits a database of preanalyzed whole-genome next-generation RNA sequencing-based transcriptomic profiles for stages of lesion formation gathered from 98 post mortem human brain samples of 10 patients with PMS and 5 non-neurological disease control cases, including normal appearing white matter (NAWM), active, inactive, chronic active, and remyelinating lesions. We applied strict preprocessing and conservative statistics and detected thousands of genes that are significantly differentially expressed during lesion evolution compared with control samples (Fig. 1). Our MS Atlas features an online web-based data analysis platform to identify and extract mechanistic pathways and gene sets that distinguish lesion types and are candidate drivers of different lesion type formation.
The MS Atlas will fuel future research projects and significantly aid in advancing not only the MS field but also for research in other neurological diseases, as it allows researchers to (1) search for certain molecules of interest as drug targets or biomarkers of brain lesion genesis, (2) compare gene panels extracted from functional cell or animal studies, and (3) discover mechanistic markers using de novo network enrichment from genes of interest in different lesion types. We showed that the drug target known to be only effective in early disease stages is present in the active-but less in the chronic active lesion types characteristic of progressive MS. The MS Atlas is extendible and will continuously be updated with future transcriptome profiles. We also aim to integrate genomewide methylome data from the same tissues, making it possible for the user to correlate the gene expression with methylation status, and to mine its mechanistic joint effects on lesion evolution. Human postmortem brain tissue Seventy-five snap-frozen tissue blocks from 10 PMS patients and 25 blocks from five donors without neurological disease 20 have been obtained from the UK Multiple Sclerosis Tissue Bank (UKMSTB) at Imperial College London (Table 1). All tissues were obtained within 30 h after death. The age of patients at death was 52.4 -10.2 years, and the age of the controls was 56.4 -14.1 years. We examined 4-10 brain areas/ lesions from each brain: altogether 20 NAWM areas (7 patients), 17 active lesions (8 patients), 14 inactive lesions (5 patients), 6 remyelinating lesions (4 patients), 17 chronic active lesions (7 patients), and 25 control WM areas (5 controls).

Immunohistochemistry and lesion classification
Snap-frozen tissue has been sectioned and stained for classification of NAWM, active, inactive, and remyelinating lesions based on antibodies against myelin oligodendrocyte glycoprotein to detect myelin integrity and human leukocyte antigen D related (HLA-DR + ) to characterize the inflammatory state. 21 For staining with very late antigen 4 (VLA-4)/integrin a-4 antibody (ab77528; abcam), we used tissue from one MS patient and control.
RNA extraction from specific histological brain areas The brain fields of interest were microdissected in a cryostat (10-100 mg/sample). Total RNA has been isolated with miRNeasy Mini Kit from Qiagen, and DNAse I treatment (RNAse-Free DNAse Set; Qiagen) was applied to eliminate genomic DNA interference. RNA concentration and purity have been measured on a NanoDrop spectrophotometer ND-1000 (Thermo Scientific) and the integrity of RNA (RIN) was measured using the Bioanalyzer 2100 (Agilent Technologies). The fragmentation time and cleanup steps during library preparation have been adapted for each sample based on the RIN value.

RNA-seq
One microgram of RNA per sample was processed to remove ribosomal RNA followed by library preparation using TruSeq Stranded Total RNA Library Prep Raw data analysis and quality control The data were demultiplexed by the Illumina machine and exported in the FASTQ file format. Afterward, the read quality for the 100 samples was accessed through FastQC. 22 Trimmomatic 23 was used to trim the reads and remove any hypothetical adapter contamination. The software was provided with the correct Illumina adapter sequences and the quality cutoff for the leading/trailing bases as well as for the sliding windows was set to 20. The minimal length of the trimmed reads was set to 17 to include potentially present microRNA. In the next step, STAR aligner was utilized for read mapping against the human genome (hg38, downloaded May 8, 2017). The mapped reads were further processed using htseq-count 24 in strict mode to access raw read counts for every gene.
Two samples have been excluded during quality control. Some of the remaining 98 samples had to be resequenced due to low read count. Throughout all sequencing and resequencing steps we ensured the minimization of batch effects by randomly distributing the samples over all 15 different flow cells. In total, we ended up with 73 cases and 25 controls.

Statistics
The unique Ensemble identifiers (IDs) were mapped to gene symbols using the R package ''org.Hs.eg.db.'' 25 In case the package was missing a gene symbol the Ensemble ID was replaced with a unique number. During the mapping process, about 25% of the Ensemble IDs could not be mapped to a gene symbol. Note that the gene symbols are solely used for result visualization, whereas all analyzes have been performed based on the unambiguous Ensemble IDs. We used EdgeR 26 to process the raw read counts and scan for significant genes for the different lesion types. All samples have been normalized for library sizes. Five generalized linear models were trained to reveal differentially expressed genes between control (WM) and each of the five lesions types (NAWM, active, chronic active, inactive, and remyelinating). All models were adjusted for age and gender. Our models additionally account for lesion distribution, since from every patient multiple samples of the same lesion types have been extracted and used. We obtain, for every lesion, a list of genes with the corresponding log 2 -fold-changes (logFC) and the p-value corrected for multiple testing using false discovery rate (FDR)-correction (Benjamini-Hochberg). 27 The MS Atlas database and online analysis platform The processed data were then integrated into a database, that we make publicly available to the research community together with a web-interface based on RShiny. 28 Besides data download, the platform offers three major visualization tools to compare the different lesion types and extract markers at different levels in the system biology value chain. Heatmaps and volcano plots allow for the extraction of gene panels associated with lesion type. Network enrichment methodology (i.e., KeyPatwhayMiner 29,30 ) enables the identification of mechanistic (i.e., subnetworkbased) markers. We integrated the human proteinprotein interaction network from the Integrated Interactions Database 31 filtered for only brain tissuespecific interactions with experimental evidence, orthologous mice genes, and computational prediction. The MS Atlas online platform produces visualizations on-the-fly using a variety of R packages ( Table 2).

Code availability
The code used for the analysis of differential gene expression as well as the source code for the online tool are available as GitHub repository (https://github .com/frischt/msatlas). Used R packages with specified version can be found in Table 2.

Data Records
The raw data of all 98 samples can be downloaded from the gene expression omnibus (GEO) database (GSE138614) as FASTQ files. In addition, the raw read counts as well as the results of differential expression calling with edgeR are available as text files. The GEO data set is accompanied by the standard excel file giving detailed information about each sample, including patient ID and lesion type.
Technical validation: drug target expression during lesion genesis Natalizumab is a monoclonal antibody used in the treatment of RMS patients. 32 It blocks the alpha4beta1 integrin (VLA-4)-mediated trafficking of pathogenic lymphocytes through the blood-brain barrier, and prevents inflammation in the CNS. 33 Although natalizumab is one of the most effective treatments in RMS patients, 34 its efficacy in PMS is limited. 35 Our database and validation by immunohistochemistry indicate that VLA-4 is highly expressed in active lesions even in the PMS phase, but it is significantly upregulated in all lesion types compared with the NAWM (Fig. 2); the limited efficacy in the ASCEND clinical trial of PMS may be related to the increasing number of chronic active lesions in this phase of the disease with less expression of VLA-4 compared with the active lesions. 36,37 Alternatively,  additional mechanisms of inflammation that are unrelated to VLA-4 and/or increasing dominance of pathways independent of inflammation may contribute. 38 Usage Notes MS Atlas web interface offers three major visualization themes: heatmaps, mechanistic candidate networks, and volcano plots. The user can adjust several parameters (Fig. 3A) to choose statistical significance levels of the profiled genes. Initially, the user selects a (set of) lesion type(s) of interest. The user is further asked to choose whether the interest is in up-, down-, or overall deregulated genes. To study the evolution and development of lesion types, one might, for example, filter for genes downregulated in chronic but upregulated in inactive lesions. In the next step those genes can be sorted by FDR-corrected p-value and filtered for a minimal logFC value. The user may also directly search for a specific gene of interest and check for its expression changes across different lesion types. The MS Atlas platform will then visualize the gene's expression across the selected lesions in a heatmap (Fig. 3B). Genes and lesions are ordered based on a hierarchical clustering using the Euclidean distance metric. The color represents logFC (lesion vs. control). The individual logFC of a gene in a lesion is shown in a tool-tip when the mouse is hovering over the corresponding field. All data shown in the heatmap can be exported as CSV file or PNG image. Furthermore, to suggest potential mechanistic markers putatively driving lesion type evolution, selected genes can be projected onto the human proteinprotein interaction network (Fig. 3C). Since the network contains > 400,000 interactions and 13,000 genes, displaying the full network would not help extracting useful information. Instead, we integrated the de novo network enrichment method KeyPathwayMiner, which extracts subnetworks that distinguish, on a mechanistic level, between MS lesion types and, thus, provides first hints on how lesion evolution is driven and controlled on a systems biology level. We allow to specify a number of exception genes (k), which do not necessarily have to be significantly differentially expressed between lesion types (i.e., outliers) but still play a central role in the interaction network. A mouse click on a node/gene in the network reveals additional information. The key networks can be exported in SIF format for downstream analyses in Cytoscape 39 or as PNG image file.
Finally, the platform allows for on-the-fly visualization of volcano plots for all genes of a selected lesion (Fig. 3D), where two thresholds have been chosen (FDR < 0.05 and logFC > 1.5) to color-code the genes accordingly.
Author Disclosure Statement Z.I. reports personal fees from Biogen, Sanofi-Genzyme, Merck, and Novartis, outside the submitted study. The other authors declare that they have no competing interests.