De Novo and Supervised Endophenotyping Using Network-Guided Ensemble Learning

Introduction: Precision medicine requires the accurate identiﬁcation of genes and pathways that mechanisti-cally deﬁne a disease phenotype. Modern omics may deliver this, but has until now yielded only few translational successes. While gene signatures derived from single omics analysis have proven useful for disease diagnosis and prognosis, they often do not explain the underlying mechanism. Methods: We here present Grand Forest, an ensemble learning method that extends random forests and integrates experimental data with molecular interaction networks to discover relevant endophenotypes and their deﬁning gene modules. Our method covers two application scenarios: a supervised method for ﬁnding modules associated with outcome and an unsupervised method for ﬁnding de novo patient subgroups. Results: We applied the supervised Grand Forest methodology to ﬁve disease-related transcriptome data sets and compared the results with four state-of-the-art methods. Grand Forest consistently found gene modules with greater biomedical relevance, reproducibility, and interaction density, but fewer differentially expressed genes. Using the unsupervised method to discover gene modules from unlabeled data, lung cancer patients could be de novo stratiﬁed into clinically relevant molecular subgroups. Further analysis revealed that known disease genes were only marginally over-represented among differentially expressed genes, and that our method was driven mainly by network topology. Conclusion: With Grand Forest, we developed a novel approach to disease module discovery and demonstrated it identiﬁes biologically relevant gene modules and patient subgroups. We conclude that differential expression was not effective for identifying driving genes and that the results were likely confounded by bias in the network data. We caution readers to consider these issues when applying network-based methods to gene expression analysis. Grand Forest is available at https://grandforest.compbio.sdu.dk.


Introduction
The increasingly large amounts of functional genomic data currently available in public databases such as the Gene Expression Omnibus (GEO) and through extensive data collection efforts such as The Cancer Genome Atlas have enabled large-scale integrative analyses aiming to discover mutations and expression pat-terns associated with a specific disease. A key aim in precision medicine has been the identification of molecular subtypes from molecular profiling data. By classifying patients as different subtypes, the aim is to stratify patients into groups with distinct clinical traits, such as expected survival time, risk of disease recurrence, or response to treatment. To this end, significant effort has been put into identification of gene signaturessmall sets of genes that exhibit a distinct expression or mutation pattern associated with a specific phenotype. [1][2][3][4] Despite proving useful for prognosis, different breast cancer signatures have little overlap in genes and have been shown to be inconsistent across data sets. 5 Furthermore, most random gene signatures of 100 or more genes were found to be significantly associated with outcomes in breast cancer, despite having no relationship with the disease itself. 6 This demonstrates a major limitation of gene expression-based analysis: a change in phenotype may lead to gross global changes in the transcriptome, and thus, the genes that are best suited for distinguishing different symptoms or outcomes are not necessarily important for development or progression of the disease itself.
To cope with the inherently noisy and overdetermined nature of molecular profiling data, many researchers have proposed integrating experimental data with secondary data, in the form of biological interaction networks, to produce more stable and biologically meaningful models. This is commonly achieved either through searching for functional enrichment in known pathways 7,8 or finding enriched gene modules in global interaction networks (de novo pathways). [9][10][11][12][13][14] The latter approach is especially promising as it may help uncover previously unknown molecular interactions and mechanisms not currently reported in databases such as the Kyoto Encyclopedia of Genes and Genomes (KEGG) and Reactome. Machine learning methods are also increasingly often utilized to develop more sophisticated models from biological data, for instance, in analyzing expression patterns, 15 regulatory network inference, 16 protein-protein interaction (PPI) prediction, 17 and elucidating genotype-phenotype relationships. 18,19 In this work, we present a novel kind of module discovery method called Grand Forest (graph-guided random forest). Besides the experimental data to be analyzed (e.g., gene expression or methylation data), our method also integrates a network describing the pairwise relationship between the features comprising the data set (e.g., PPIs). When building the decision tree forest, a connected subnetwork is randomly sampled from the full network for each decision tree, and each decision tree is allowed to use only the features contained in its corresponding subnetwork. Furthermore, each tree is built under the constraint that each split variable in the tree must be a neighbor of the variable in the split directly above it in the decision tree. This constraint enforces that the set of variables in each decision tree form a connected subnetwork in the interaction network and that each split always follows a split on an adjacent gene. By estimating feature importance from the trained model, we are then able to extract a highly connected gene set that explains the phenotype. The subnetwork induced by the most important genes is then extracted and returned as result. We introduce two application scenarios: a supervised and an unsupervised analysis workflow (Fig. 1). Our method extends significantly on previous ideas from Dutkowski and Ideker 20 (see Supplementary Text S1 for details).
We first apply the supervised Grand Forest method to whole-genome gene expression data from patients diagnosed with breast cancer, lung cancer, Huntington's disease, ulcerative colitis, and amyotrophic lateral sclerosis (ALS) and show that our method is able to discover subnetwork modules with greater biological relevance than other existing, network-based, diseasegene module detection tools while also being less sensitive to the sampling of the patient population. We then demonstrate that our method can also be applied to unsupervised endophenotyping, applying it to analyze a lung cancer data set. Unlike most other module discovery tools, Grand Forest does not employ statistical hypothesis tests or differential expression analysis to score the individual genes and, as such, does not make any assumptions on the underlying distribution of the expression data. The use of decision trees may also make it possible to discover interaction effects between genes. Furthermore, Grand Forest can be applied directly to both categorical and numerical clinical variables, as well as right-censored survival data. Hence, its supervised version can-in addition to classificationalso be utilized for network module-based regression as well as survival analysis, which makes it, to our knowledge, the first such tool available. In addition, it is the first method supporting unsupervised (i.e., de novo) stratification of patients into groups while simultaneously extracting subnetworks whose genetic expression explains the difference between the identified groups.
Comparison of the modules reported by each method revealed that Grand Forest generally selected modules with high interaction density, but lower differential expression, compared with other methods, suggesting that disease-associated genes were selected mainly due to network topology. Furthermore, known disease-associated genes were observed to be only marginally over-represented among differentially expressed genes. We conclude that one should exercise caution when applying network-based methods for identifying disease gene modules from gene expression data and be aware of the limitations of gene expression analysis as well as possible biases in molecular interaction networks.
Grand Forest is freely available at https://grandforest .compbio.sdu.dk where we provide the source code, a package for the R programming language, and an easyto-use online analysis platform.

Methods
Graph-guided random forest algorithm Random forest is an ensemble learning method that works by generating a large ensemble of decision trees. 21,22 It is based on the random decision forests method, 23 but extended to use the random subspace method (also known as feature bagging). It has achieved widespread use in biomedical research as it works well for data sets with many more features than samples, can be applied to data with a mix of continuous and categorical variables, and works for FIG. 1. Overview of the supervised and unsupervised Grand Forest workflows. In the supervised workflow, expression data are integrated with an interaction network to identify a gene module associated with a response variable, that is, survival time. In the unsupervised workflow, a model is trained to recognize unlabeled patients from a generated background distribution. From the trained model, highly informative genes are then selected and used to stratify patients into groups with different endophenotypes. multiclass problems. Furthermore, it provides several measures for estimating feature importance, making it possible to identify important genes in molecular profiling data. 24,25 We refer to the original articles by Breiman and Cutler for a description of the random forest algorithm. 21.22 The Grand Forest algorithm works similarly to the random forest algorithm, but differs in the way split variables are selected during decision tree building. Grand Forest takes as input a design matrix X = fx i, 1 , . . . , x i, p g n i = 1 , response variables Y = y 1 , . . . , y n , and a simple graph G = (V, E), where each vertex v i 2 V corresponds to the column i in X and E is a set of two sets of vertices from V. The algorithm builds a forest of decision trees on the training set, using the graph G to guide the feature bagging procedure and split variable selection. The graph is only used during training and does not affect the prediction procedure, which is carried out like it would in the standard random forest algorithm. Each decision tree is trained on a random sample with replacement of the training data. The algorithm is outlined in Figure 2.

Feature bagging
Grand Forest uses the topology of the feature interaction graph G to perform feature bagging. Each decision tree is trained on a subset of m response variables, where each set of variables induces a connected subgraph in G. This subgraph is computed only once for each decision tree and used in all splits in that tree. Each subgraph is generated by first selecting a vertex v s uniformly at random from all vertices. A subgraph is then grown by performing a breadthfirst search traversal starting at v s until m vertices have been selected or until there are no more vertices to visit. When a new vertex is visited, its neighbors are added to the queue in random order to further randomize the sampling. See Supplementary Text S2.1 for a detailed overview.

Split variable selection
When building decision trees, splits are formed by selecting a variable and a value to split the partition on, which maximizes some split criteria, for example, the decrease in Gini impurity for classification forests. The first split in each decision tree is selected among all features in the feature subgraph. In subsequent splits, each split variable must be selected only from the variables that are connected to the parent node in the decision tree ( Supplementary Fig. S1). This

Feature importance
In Grand Forest, we use the mean decrease in Gini impurity to estimate feature importance. Gini impurity was chosen over other methods, such as permutation importance, because we are not concerned with how important a gene is for predictive performance, but rather how much information it provides at the time of the split, conditioned on the splits preceding it.

Implementation
Our implementation of Grand Forest is based on ranger 26 and is written in C + + with bindings to R. The feature graph is only used when selecting features as possible split variables and does not affect the splitting procedure itself. Because of this, it is trivial to generalize the method to other variations of random forest. Besides random forest for classification, ranger also implements regression forests, 21 probability forests, 27 and survival forests. 28 By extension, Grand Forest has been implemented to support these methods as well. The source code is available through GitHub (https:// github.com/SimonLarsen/grandforest).

Unsupervised analysis using Grand Forest
The Grand Forest algorithm is used for unsupervised learning using an approach proposed by Breiman. 22 The method is based on the following assumption: if the data item is structured in some way, it should be distinguishable from a randomized version of itself. Given a design matrix X, we compute a synthetic matrix X¢ with the same number of rows and columns by randomly sampling values for the corresponding variable in X. Sampling can be done either with or without replacement. In this work, we sam-pled with replacement. A combined design matrix X Ã is then built by concatenating the rows from X and X¢, and the vector of response variables is defined as Y = fy i g 2n i = 1 , where y i = 1 if row i came from X and y i = 0 otherwise. A Grand Forest model is trained on the design matrix X Ã and response variables Y guided by some graph G. The most important features are selected by ranking all features based on some importance measure and selecting all features above some cutoff. These features are assumed to contain a high amount of information and are thus good for clustering the data set into clusters. The final clustering is performed by clustering the original design matrix X based only on the top features identified by the Grand Forest model.

Gene expression data preparation
Gene expression data sets were obtained through the GEO. The data sets are available through the following accession IDs: breast cancer (GSE20685, n = 327), 29 non-small cell lung cancer (GSE30219, n = 268), 30 ulcerative colitis (GSE11223, n = 202), 31 Huntington's disease (GSE3790, n = 54), 32 and ALS (GSE112680, n = 164). 33 All five datasets are from microarrays (Table 1 and Supplementary Table S1). Processed, probe-level expression values were obtained as series matrix files. Probes were mapped to NCBI Entrez gene IDs using the corresponding platform data tables provided through GEO. For genes mapping to multiple probes, the median probe value was used.
The lung cancer data set contained samples from both small cell and non-small cell patients. Only non-small cell cancer samples were used because their molecular pathways (as described in KEGG) are different, and non-small cell was the most common type in the data set (Supplementary Table S2). The Huntington's disease data set contained samples from different brain regions. Only samples from the caudate nucleus were used For ALS, breast cancer, and lung cancer data sets, we used the right-censored survival time as the outcome variable. For the ulcerative colitis (UC) data set, we used the disease status (UC or no UC) as the outcome variable. For the Huntington's disease (HD) data set, we separated the subjects into a control group (no HD) and a case group (Vonsattel grades 2-4). Samples with Vonsattel grade 0-1 were discarded.
The Gene eXpression Network Analysis (GXNA) method requires samples to be stratified into discrete classes. For ulcerative colitis and Huntington's disease, we used the classes described above. For the survival data sets, patients were stratified into high-and lowrisk groups of approximately equal size. We used a cutoff of 62 months in lung cancer and 10.6 years in breast cancer and 4 years in ALS. Patients who could not be placed in either group due to censoring or lack of follow-up were discarded. See Table 1 for an overview.

Statistical significance tests
For the survival data sets (breast and lung cancers), the statistical significance of each gene was computed using a Cox proportional hazards regression model. For the regression (Alzheimer's disease) and classification data sets (ulcerative colitis and Huntington's disease), significance was estimated using a linear model with the R/Bioconductor package limma. 34 Network data preparation We collected network data from the Integrated Interactions Database (IID) 35 (version 2017-04). IID integrates experimentally validated PPIs from multiple major databases, such as BioGRID, IntAct, and HPRD, as well as interactions from orthologs and computational prediction. Gene identifiers were mapped from UniProt IDs to Entrez gene IDs using the human genome-wide annotation package in Bioconductor (version 3.4.1). 36 After removing self-loops and duplicated edges, the resulting network contained 17,487 genes and 891,969 interactions. Biological networks generated by aggregating interactions from literature are associated with study bias arising from disease-related genes being studied more often. 37,38 The IID network was chosen over other networks, such as BioGRID and HPRD, constructed solely from manual curation of literature, to minimize the effect of study bias on the results.

Enrichment of known pathways
To evaluate the results produced by Grand Forest, we extracted gene modules from gene expression data from patients diagnosed with breast cancer, non-small cell lung cancer, ulcerative colitis, Huntington's disease, and ALS. As response variables, we used the overall survival time in breast cancer, lung cancer, and ALS and disease status (case vs. control) in ulcerative colitis and Huntington's disease, respectively. The interaction network was constructed from experimentally validated and computationally predicted interactions obtained from the IID (see Methods).
We evaluated the biological relevance of the extracted gene modules by investigating how congruent the genes in the extracted modules were with published curated molecular pathways related to the phenotype of each data set. Reference gene sets were extracted from KEGG. 39 For breast cancer, lung cancer, Huntington's disease, and ALS, we extracted the disease-specific pathway for each disease. Because KEGG has not published a specific pathway for ulcerative colitis, we instead aggregated all genes from the three pathways indicated as associated with UC: inflammatory bowel disease, cytokine-cytokine receptor interaction, and the Jak-STAT signaling pathway (Supplementary Table S3).
To evaluate Grand Forest's ability to find meaningful gene modules, we compared our results for all five data sets against the results obtained using four, state-ofthe-art, network-based module discovery tools: BioNet, 12 KeyPathwayMiner (KPM), 13 GXNA, 11 and GiGa. 10 These tools were selected based on results of a recent evaluation by Batra et al. 14 For each method, we extracted gene modules in each data set over a range of parameters chosen such that they generate modules in a range between *25 and 100 genes (Supplementary Table S4). Statistical significance of enrichment was computed using a hypergeometric overrepresentation test (Supplementary Text S2.2). Grand Forest significantly outperformed all tools on all data sets (Fig. 3a). The difference was especially pronounced in the two cancer data sets, where Grand Forest achieved a highly significant enrichment (median p-values of 6.13eÀ15 and 4.64eÀ9, respectively), while the other methods found little or no overlap with the associated pathways. Grand Forest performed worst on the Huntington's data (median p-value 0:074). All other tools delivered insignificant results (i.e., median p > 0:05) on all data sets.

Enrichment of pathways related to cancer hallmarks
To provide further validation of our results in the breast and lung cancer data sets, we investigated how strongly associated the extracted gene modules were with the hallmarks of cancer. 40 If the genes selected by a method are biologically relevant for proliferation of cancer, we would expect a functional enrichment of pathways related to these cancer hallmarks. Alcaraz et al. 41 compiled a set of KEGG pathways related to each hallmark. Based on their findings, we compiled the relevant genes for each hallmark as the union of all genes in the pathways related to that hallmark (Supplementary Table S5). Due to the great number of genes being compared against, we restricted this analysis to only modules of 75 or more genes (see Supplementary Figs. S2 and S3 for all sizes).
Grand Forest achieved a significantly higher enrichment in both breast cancer and lung cancer in all but one hallmark, namely genome instability and mutation, where Grand Forest was outperformed by the other tools in lung cancer (Fig. 3b, c). Grand Forest generally achieved a highly significant degree of enrichment, with p-values below 1eÀ10 in all but two hallmarks.

Stability of selected genes
For the gene modules to be biologically meaningful, the genes in the extracted subnetwork modules should be stable and reproducible, that is, not varying significantly between samples from the same population. We evaluated how stable the modules produced by our method were compared with other methods, by repeatedly removing 20% of the patients, selected randomly, and measuring their pairwise similarity between all repetitions with the same parameters. Parameters were chosen to produce modules of *25, 50, 75, and 100 genes. Gene set similarity was measured using the Jaccard index (Supplementary Text S2.3).
We were unable to obtain results for KPM and BioNet. Neither method provides a way to enforce a specific module size, and the size instead depends on the chosen hyperparameters. For both methods, the size of the extracted modules varied significantly between repetitions using the same parameters, often by several orders of magnitude, making it infeasible to obtain appropriately sized modules for each repetition.
Overall, Grand Forest produced more stable results compared with existing GiGa and GXNA (Fig. 4a) in all data sets except for Huntington's disease. The difference in performance was most significant in breast cancer and ALS, where Grand Forest was stable even for small gene sets, while the other methods produced little overlap between repetitions. We observed that for the other methods, stability generally decreased with smaller module sizes; however, this effect was less pronounced in Grand Forest. These results suggest that the modules produced by Grand Forest are less sensitive to the sampling of the patient population. While we cannot compare it with KPM and BioNet, it is unlikely that either is more stable given their high sensitivity to hyperparameters.

Interaction density of selected modules
We evaluated the number of PPIs between genes in the extracted modules to better understand why the results produced by Grand Forest differed so much from other methods. For each module, we extracted the subnetwork induced by the constituent genes and counted the number of conserved edges. We observed that Grand Forest selected significantly more dense modules than the other methods for breast cancer, ulcera-tive colitis, and ALS, but similarly dense modules for lung cancer and Huntington's disease (Fig. 4b). All methods selected highly dense modules for lung cancer and sparse modules for Huntington's disease.
The observed difference in density may, in part, explain why the genes selected by Grand Forest were more congruent with published molecular pathways. This suggests that a large part of the power comes from the network rather than the gene expression data. However, given that all methods produced highly dense modules in lung cancer, even though only Grand Forest achieved a significant level of enrichment, this does not fully explain the difference in performance.

Statistical significance of selected genes
To shed further light on the source of the signal in data sets, we evaluated how many of the genes in extracted modules were significantly differentially expressed. For each module, we counted how many genes were significantly associated with the outcome (nominal p < 0:05).
We observed that Grand Forest generally selected fewer significant genes compared with other methods (Fig. 4c). This contrasted greatly with the other methods, where all modules consisted almost exclusively of significant genes. This is not surprising given that the other methods are designed to explicitly maximize this property in some way, either by maximizing the number of significant genes or by maximizing some aggregate significance measure. Interestingly, in the four data sets where Grand Forest selected fewer significant genes, namely in breast cancer, lung cancer, ulcerative colitis, and ALS, the difference in performance wrt. enrichment of KEGG pathways was greatest. This difference was especially pronounced in breast cancer where Grand Forest only selected around 25-35% significant genes while achieving a highly significant degree of enrichment. We also evaluated how many of the genes in the associated KEGG pathways were statistically significant. We observed that in all data sets, a large fraction of genes were in fact not significantly associated with the outcome, and the fraction of significant genes in the reference pathways was overall not significantly larger than among all genes ( Fig. 4d and Supplementary Fig. S4). Only in Huntington's disease was a majority of genes significant (66%). These results suggest that a statistically significant association of expression with a phenotype is not necessarily adequate to determine which genes are important for development or progression of a disease. De novo endophenotyping of lung adenocarcinoma Grand Forest can also be applied to unlabeled data to discover modules of highly interacting genes that stratify patients into distinct clusters, for example, molecular subtypes or endophenotypes. Feature importance was estimated without any clinical variables by modeling the problem as an unsupervised as supervised learning problem (see Methods). We then extracted the gene module comprising the 20 most important genes (Fig. 5a).
The selected genes induced a highly dense subnetwork in the interaction network. Among the 20 genes were three genes found in the KEGG non-small cell lung cancer pathway: TP53, EGFR, and MAPK1 (p = 2:6e À 5). Furthermore, we found four known oncogenes, MYC, 42 tumor suppressor genes, BRCA1 49 and TP53. 50 To evaluate the clinical relevance of the selected genes, we extracted all adenocarcinoma samples from the lung cancer data set (n = 85) and clustered them into two groups with k-means clustering, using only the expression of these 20 genes. Because k-means is dependent on randomly chosen initialization, clustering was repeated 20 times and the result with the greatest Silhouette index was chosen (Supplementary Text S2.4). The clustering of patients was significantly associated with overall survival (log-rank, p = 0:0079) (Fig. 5b, c). For comparison, we also clustered the patients on all genes using the same procedure. When using all genes, the resulting stratification was less associated with overall survival (p = 0:014) (Fig. 5d).

Discussion and Conclusion
In this study, we introduced Grand Forest, a novel graph-guided set of ensemble learning methods based on the well-known random forest strategy to allow for network-guided supervised and de novo endophenotyping. Our tool and the implemented approaches differ significantly from conventional module discovery and patient stratification tools. Grand Forest does not expect the data to follow a specific distribution, and it does not rely on statistical significance tests and differential expression analyses, but instead aims to explain the phenotype using an ensemble of decision trees. When compared with traditional module discovery tools across gene expression data from five diseases, our method achieved a significantly higher degree of enrichment of relevant molecular pathways. Results also showed that Grand Forest was less sensitive to the sampling of the patient population than GXNA and GiGa, but a comparison with KPM and BioNet was not possible. By virtue of being based on decision trees, our method is also invariant to scaling and robust to outliers. We observed that despite selecting fewer genes with a statistically significant association with the clinical variable, Grand Forest extracted modules that were more congruent with KEGG molecular pathways related to the disease. However, it appeared that the solutions computed with Grand Forest were largely driven by interaction density rather than expression patterns associated with disease outcome. This was further demonstrated by the fact that a large fraction of the genes in the reference gene sets were not statistically significant. This demonstrates that a large fold change, or otherwise sig-nificant association with outcome, is not sufficient to identify important causal driver genes for a disease.
A commonly raised concern with network-based methods is that a similar performance can often be achieved using random networks instead. 41,51.52 We evaluated the performance of Grand Forest on the five data sets using two randomized network models, one generated by randomly rewiring edge pairs while preserving node degree and one generated by rearranging the node labels in the network. We observed that rearranging node labels resulted in significantly worse performance wrt. enrichment of relevant pathways ( Supplementary Fig. S5) and generally did not achieve a significant level of enrichment. However, rewiring edges did not significantly affect enrichment, which confirms our method is heavily reliant on node degree. Permuting the outcome variables also only had a modest effect on the degree of enrichment for Grand Forest while (in some cases) even increasing enrichment for GiGa and GXNA ( Supplementary Fig. S6). We furthermore investigated the effect of false negatives and false positives in the interaction network by repeating the experiment after removing 25% of edges or adding 25% more random edges in the network, respectively (Supplementary Figs. S7 and S8). Neither of these two perturbations significantly affected the results.
Taken together, our results point to a central problem with module discovery from gene expression data: methods relying primarily on gene expression will in many cases not identify disease-driving genes, while methods relying primarily on network structure are likely to select disease drivers due to bias in the network alone. This may, in part, be because gene expression is too far downstream and, as such, expression changes may often correspond to the cellular response to the disease rather than the underlying cause. Furthermore, it is likely that Grand Forest selects highly dense modules because the difference in patient phenotype translates to large-scale changes in the transcriptome, which makes it trivial for the algorithm to build a set of genes that explain the outcome well. Therefore, the algorithm will often choose hub genes since they are easier to reach in a graph traversal. This observation is also in line with previous results on random gene expression signatures in breast cancer. 6 Due to the incompleteness and noisy nature of current PPI networks, it is uncertain whether diseaseassociated genes have a large number of reported interactions due to research bias or if genes with many interactions are often associated with disease due to being involved in many key cellular mechanisms. With this in mind, we advise researchers to use caution when applying network-guided methods for discovering disease genes and modules.
To make our method easily available to researchers, we developed an easy-to-use web server for carrying out analyses using Grand Forest. The web server allows users to upload a gene expression data set and analyze their data using two different workflows: a supervised workflow and an unsupervised workflow, mirroring the types of analyses carried out in this article. A set of commonly used genetic interaction networks is provided, but users can also upload custom network data. A gene set enrichment analysis is provided for both workflows to enable searching for over-represented Gene Ontology terms, pathways, and disease associations among the extract genes. Furthermore, users are also able to extract and visualize networks of drugs and miRNAs targeting the genes in a module to search for potentially druggable targets. See Figure 6 for a graphical overview.
In summary, with Grand Forest, we introduce a new method for disease-gene module discovery by integrating genomic profiling data with molecular interaction networks. We also introduce the first network-based, de novo endophenotyping methodology, allowing analysis of unlabeled data. We show that Grand Forest identifies gene modules closely associated with known disease genes, but that these results are highly driven by network topology and likely confounded by inherent bias in the underlying network. Finally, we provide a comprehensive web server to make our methodology easily available to researchers.