ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Software Tool Article

High-Performance Mixed Models Based Genome-Wide Association Analysis with omicABEL software

[version 1; peer review: 2 approved, 1 approved with reservations]
PUBLISHED 20 Aug 2014
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

Abstract

To raise the power of genome-wide association studies (GWAS) and avoid false-positive results in structured populations, one can rely on mixed model based tests. When large samples are used, and when multiple traits are to be studied in the ’omics’ context, this approach becomes computationally challenging. Here we consider the problem of mixed-model based GWAS for arbitrary number of traits, and demonstrate that for the analysis of single-trait and multiple-trait scenarios different computational algorithms are optimal. We implement these optimal algorithms in a high-performance computing framework that uses state-of-the-art linear algebra kernels, incorporates optimizations, and avoids redundant computations,
increasing throughput while reducing memory usage and energy consumption. We show that, compared to existing libraries, our algorithms and software achieve considerable speed-ups. The OmicABEL software described in this manuscript is available under the GNU
GPL v. 3 license as part of the GenABEL project for statistical genomics at http: //www.genabel.org/packages/OmicABEL.

Introduction

Current biomedical research is experiencing a large boost in the amount of data generated. In particular, investigations involve human cohorts comprising hundreds of thousands of participants as part of nation-wide biobanking initiatives; furthermore, by using both arrays that include hundreds of thousands of single nucleotide polymorphisms (SNPs), and more recently, exome and whole-genome re-sequencing, the genomes of these participants are being characterized at an increasing level of detail, bringing the number of features assessed to tens of millions. At the same time, technologies for high-throughput assessment of different molecular “omics” phenotypes in large study cohorts are becoming more and more affordable. These molecular phenotypes characterize different classes and sub-classes of biological molecules, their functional modifications and relationships. Examples include hundreds of thousands of epigenetic modifications (epigenome1), levels of tens of thousands of transcripts (transcriptome2,3), metabolites (metabonome or metabolome4,5), glycans (glyco(protein)ome6,7), and proteins (proteome8). The evolution of current molecular techniques expands our capacity to access different components in the omics space, and new prominent omics emerge (e.g. cellomics, interactomics, activomics). The study of the genetic control of different omics brings the promise of new fundamental and applied biological discoveries; however, such analyses pose “big data” challenges.

Genome-Wide Association Studies (GWAS) is an established tool for analyzing the genetic control of complex traits9. In GWAS, the association between millions of genetic markers (usually SNPs) and phenotype(s) of interest is studied, with significant associations highlighting the genomic regions harboring the functional variants involved in the control of the trait. While initially GWAS were mostly used to study common diseases, with the rising availability and affordability of omics phenotypes, this methodology is now also applied to investigate the omics space6,7,1013, providing important insights into both the mechanisms underlying the genetic regulation of particular biological systems, and the determinants of human health and disease1416.

In this work, we address the computational challenges posed by big-data GWAS. These challenges arise when size of sample under analysis is very large or when (potentially hundreds of) thousands of omics phenotypes are studied. We consider analyses facilitated by the use of linear mixed models (LMMs)17,18, which allow for modeling of correlations between phenotypes of relatives. The LMMs are among the most flexible and powerful methods to account for the genetic (sub)structure that inevitably occurs even in carefully designed large population-based studies. However, the increase in power and precision achieved through the use of mixed models comes with considerable costs in terms of computing time.

Recent advances in GWAS using mixed models1925 represent a breakthrough compared to older methods, and allow analyses of a limited number of traits in reasonably sized samples even on personal computers. Still, current algorithms and software may be prohibitively expensive for analysis of large samples when dealing with omics data, since the time needed for a multi-trait analysis is essentially that of a one-trait study multiplied by the number of traits. Under this scenario, the analysis of even relatively small samples sizes leads to extremely long wait times. Therefore at the moment, the LMM-based GWAS analysis of large cohorts (tens to hundreds of thousands of participants), and even small (thousands of participants) studies involving omics measurements, represents a considerable problem. This limitation compromises the analyses availability, the data-to-knowledge turnaround time, and leads to excessive energy spending.

With this work, we aimed to address the aforementioned problems for big-data LMM-based GWAS. To do so, we took advantage of properties specific to the LMM formulation of GWAS, and analyzed a number of possible algorithms applicable to the analysis of large data. By combining sophisticated linear algebra and optimization techniques, we produced a fast and scalable software. Our software facilitates GWAS of tens of thousands of samples and hundreds of thousands of omics phenotypes, without the need for super-computing facilities.

Methods

Linear Mixed Models for GWAS

In a nutshell, LMM models the phenotypes of a group of n studied individuals as a point in an n-dimensional space, which comes from a multivariate Normal distribution. The expected mean is modeled using a standard regression model as E[Y] = , where X is the design matrix which includes the genotypes of interest and other covariates, and β are fixed effects. The variance-covariance matrix is defined as M = σ2 · (h2Φ + (1 − h2)I); here, σ2 is the total variance of the trait, h2 is the heritability coefficient, I is the identity matrix, and Φ is the matrix containing the relationship coefficients for all pairs of studied individuals. GWAS are performed by consecutively including SNPs in the analysis model (usually one SNP at a time) and computing the association statistics for the included SNP, thus iteratively applying the model throughout the genome.

The statistical model considered in this work is the same as that outlined in previous works1921, and proceeds with analysis in two steps: for each trait considered, we first estimate the matrix of (co)variances between phenotypes, and then we use it when estimating the SNP effects (see Supplementary Note S1 for mathematical details).

Figure 1 illustrates how a multi-trait analysis consists of a series of t separate single-trait analyses, each of which, in turn, consists of a series of m Generalized Least-Squares (GLS) problems. The key to fast analysis algorithms is the realization that such problems are correlated, both along the m and the t direction; in big-data GWAS, any approach that ignores such correlations cannot be feasible in terms of time-to-solution.

1c7cc699-f814-4085-98d8-20185f185093_figure1.gif

Figure 1. Interpretation of multi-trait GWAS as a two-dimensional grid of generalized least-squares problems (b := (XT M-1X)-1XT M-1y).

GWAS with multiple phenotypes requires the solution of m × t correlated Generalized Least-Squares (GLS) problems, originating a three-dimensional object B of size m × t × p. Along the t direction, the variance-covariance matrix M and the phenotype y vary, while the design matrix X does not; conversely, in the m direction, M and y are fixed while X varies. Specifically, X can be viewed as consisting of two parts, XL and XR, where the former is constant across the entire grid and the latter changes along m. The figure also captures GWAS with single phenotype, in which case the dimension t reduces to 1.

Efficient algorithms for single-trait and multi-trait GWAS

Aiming at supporting computational scientists in the design of efficient software, two of the authors recently developed CLAK, an algebraic system that replicates the reasoning of human experts for the automatic discovery of linear algebra solvers26. The core idea is to first decompose a target matrix-based problem in terms of library-supported kernels, and then apply algorithmic and algebraic optimizations. Since the decomposition is not unique, CLAK returns not one but a family of possible solutions, all mathematically equivalent, but exhibiting different space and time complexity.

With the help of CLAK, we generated many solutions to perform the aforementioned GWAS analyses (for a representative list, see Table S2). Each solution was subjected to the analysis of its computational complexity, expressing the cost in terms of the number of samples, markers, and traits in question. Interestingly, depending on the number of traits, the best theoretical performance was attained by two different solutions, which we named CLAK-Chol and CLAK-Eig (described in Table S1), respectively. Figure 2 shows the surfaces representing the time complexity of CLAK-Chol and CLAK-Eig as a function of the number of traits and markers analyzed; the solid curve denotes the crossover between the surfaces. When fewer than four traits are considered, CLAK-Chol attains better theoretical performance; on the contrary, for a higher number of traits, CLAK-Eig is expected to perform better (see also Table 1, and Supplementary Methods).

1c7cc699-f814-4085-98d8-20185f185093_figure2.gif

Figure 2. Cost analysis for CLAK-Eig and CLAK-Chol.

The brown and blue surfaces indicate the number of operations performed by CLAK-Eig and CLAK-Chol, respectively, for a given number of SNPs and traits. The crossover curve suggests that for analyses with more than just a handful of traits, CLAK-Eig is the fastest algorithm.

Table 1. Computational costs for the solution of single-trait and multiple-trait analyses.

The variables n, m and t denote the sample size, the number of genetic markers, and the number of traits, respectively. v is the average number of iterations necessary to estimate the model parameters σ2 and h2 (see “Time complexity” in Supplementary Note S1).

AlgorithmEstimation
of σ2 and h2
Single-trait
analysis
Multi-trait
analysis
(t > n)
Multi-trait
analysis
(n > t)
CLAK-CholO(n3 + tvn)O(mpn2)O(tmn2)O(tmn2)
CLAK-EigO(n3 + tvn)O(mpn2)O(tmn)O(mn2)
FaST-LMMO(n3 + tvn)O(mpn2)O(tmn2)O(tmn2)
GWFGLSO(n3 + tvn)O(mpn2)O(tmn2)O(tmn2)
EMMAXO(n3 + tvn)O(mpn2)O(tmn2)O(tmn2)

The idea underlying CLAK-Chol is to linearly transform the input data to de-correlate the observations. To this end, the variance-covariance matrix M is formed explicitly, and its triangular Cholesky factor L is computed (LLT = M); through this factor, each SNP X and each trait y is then linearly transformed, giving raise to a sequence of Ordinary Least Squares problems of the form b := (XTX)-1XTy. While such problems are solvable with standard techniques, CLAK-Chol takes advantage of the fact that the covariates are fixed for all SNPs, hence lowering the computational complexity. In single-trait analyses, it is also possible to exploit the fact that the matrix M is symmetric and positive definite. Although asymptotically the Cholesky factorization is equivalent to an eigen-decomposition, in practice it requires 10 times fewer operations. Moreover, instead of using the eigenvectors (a full matrix) to rotate the SNPs, the Cholesky factor (a triangular matrix) allows us to transform them at half the cost. For more details, see the work we have previously published27 describing the CLAK-Chol algorithm.

The design of CLAK-Eig is based on three insights: 1) For a given study, the relationship matrix Φ is constant across both SNPs and traits; 2) since the variance-covariance matrix M is built by merely shifting and scaling relationship matrix, its eigenvectors are the same as those of Φ, and its eigenvalues are obtained by shifting and scaling those of Φ; 3) the inverse of M is easily expressed by inverting the diagonal matrix containing its eigenvalues (note that in our solutions we never explicitly invert matrices; we instead factor them, and operate with their factors). Together, these insights suggest that the eigen-decomposition of Φ can be computed once and for all, and most importantly, the eigenvectors of the relationship matrix can be used to rotate all the SNPs and traits only once. After the data is rotated, the computation of the mixed model based GWAS can be carried out by means of a grid of inexpensive Weighted Least Squares problems.

Once the initial eigen-decomposition of Φ is available, the complexity of CLAK-Eig is determined by three operations: the rotation of the SNPs, the rotation of the traits, and the solution of the Weighted Least Squares problems. The dominant term depends on the size of population (n), number of SNPs (m), and number of traits (t). When n > t (or n > m), the overall time complexity comes from the rotation of the SNPs (or the traits), and amounts to O(n2m) (or O(n2t)); if instead both t and m are larger than n, then the dominant term comes from the Least Squares problems, and is linear in population, SNPs and traits: O(nmt) (Table 1). Note that the CLAK-Eig algorithm is a generalization of the eigen-decomposition based algorithms published before (e.g.22,28) for a case of multiple trait analysis.

Compared with current state-of-the-art algorithms19,21,22 in multi-trait analyses, CLAK-Eig achieves a lower computational complexity. As shown in Table 1, there are two scenarios of interest, depending on whether the number of traits is larger than the population size or not. In the first case (t > n), which is probably the most typical for current and near-future omics studies, the time complexity of CLAK-Eig is linear on the number of markers, traits, and samples; by contrast, all the other methods have quadratic complexity with the sample size. In the second case (n > t), which takes place for smaller ‘omics’ and also will become more common with the increasing affordability of omics technologies and hence larger sample sizes, the cost of CLAK-Eig is determined by the sample size and the number of SNPs, and its complexity is a factor t lower than other methods.

For both CLAK-Eig and CLAK-Chol, the space complexity is mainly determined by the square of the sample size; also, a minimum of one trait, one SNP, and the p covariates must reside in memory. In total, our methods only require enough memory to accommodate n2+(2+p)n entries. If multiple SNPs and/or traits fit in main memory at once, —e.g., dozens or hundreds of them—the computational throughput of our methods improves noticeably. In this case, the space requirement becomes n2+(k+p)n, where k is the number of SNPs and traits resident in memory. As examples, for sample sizes of 10,000, 20,000 and 40,000, the n2 space requirement translates to 1, 3, and 12 GBs, respectively. More details on space complexity are provided in Supplementary Note S1.

Results

Implementation and comparison

To demonstrate the practical advantages of CLAK-Eig and CLAK-Chol, we implemented these algorithms in the OmicABEL software package. In doing so, we tailored our implementations to save intermediate results across adjacent problems; we also re-organized the calculations to fully benefit from both the efficiency of highly optimized linear algebra kernels, and the parallelism offered by modern computing platforms.

Since the size of the datasets involved in GWAS is considerably larger than the memory capacity of current processors, input and output data can only be stored in disk devices. Aware that the penalty for accessing information residing on disk is enormous—several orders of magnitude greater than the cost for performing one arithmetic operation—it is imperative to handle these big-data efficiently. By means of asynchronous transfers between memory and disk, our algorithms achieve a perfect overlap of computation and data movement. As long as the relationship matrix fits in the main memory, and regardless of the size of the data sets—both in terms of SNPs and phenotypes—, the processor never idles waiting for a transfer to complete, thus computing at maximum efficiency.

We compared the GWAS run-time of CLAK-Chol and CLAK-Eig as implemented in OmicABEL with that of several well-established packages: EMMAX19, FaST-LMM22 (two-step approximation), and GWFGLS (implementation of the mmscore method of ProbABEL21 in the MixABEL-package29). In the experiments, we considered three different scenarios, varying one among sample size n, number of SNPs m, and number of traits t, while keeping the other two values constant (Figure 3). A description of the experimental setup is provided in Supplementary Note S2 and Supplementary Note S3.

1c7cc699-f814-4085-98d8-20185f185093_figure3.gif

Figure 3. Timings comparison.

Panels (a) and (b) include timings for EMMAX, GWFGLS, FaST-LMM, and our OmicABEL software, for single trait analyses; (c) and (d) present a comparison of EMMAX, GWFGLS, FaST-LMM, and our OmicABEL software, in the case of multiple traits. In (a), the number of SNPs is fixed to m = 107 and the sample size n ranges from 1,000 to 40,000. In panel (b), the sample size is fixed to n = 10,000 and the number of SNPs m ranges between 106 and 3.6 × 107. In (c) and (d), n = 1,000, m = 106 and t ranges from 1 to 100,000.

In the first scenario (single trait and m = 10,000,000 SNPs; the sample size varies from 1,000 to 40,000), all methods exhibit a quadratic behavior, and CLAK-Chol is the only algorithm that completed all tests within 25 hours (Figure 3(a)). For the largest problem considered (sample size n = 40,000), the speed-up over the next-fastest software (FaST-LMM) is 8.3: 205 vs. 25 hours; when n = 1,000, the speed-ups over GWFGLS, FaST-LMM and EMMAX are 24, 32 and 68, respectively: 86, 112 and 240 vs. 3.5 minutes.

The second scenario (single trait and sample size of 10,000; the number of markers varies between 1 and 36 millions) shows a linear dependence on the number of genetic markers for all software packages. Again, CLAK-Chol attains the best timings, outperforming FaST-LMM, GWFGLS and EMMAX by a factor of 11.7, 93.6 and 298, respectively (Figure 3(b)), when the number of analysis SNPs is 36 millions.

Finally, the third scenario illustrates the analysis of multiple phenotypes (sample size of 1,000 and 1,000,000 genetic markers; number of traits varies between 1 and 100,000). The (estimated) time required for these analyses is presented in Figure 3(d). Note that the time scale on this graph is in years. Due to CLAK-Eig’s linear time complexity with respect to sample size, SNPs and traits, its advantage becomes most apparent: when thousands of traits are considered, CLAK-Eig outperforms GWFGLS, FaST-LMM and EMMAX by a factor of 1012, 1352, and 2789, respectively (Figure 3(d)), bringing the execution time down from months to hours.

Demonstration of application to real data

We applied the OmicABEL (CLAK-Eig) to study 107,144 metabolomic traits in a sample of 781 people from a genetically isolated population of the Vis island (Croatia). These data are part of the EUROSPAN data set reported in the works of 11, and 12. In short, the data comprise plasma levels of 23 sphingomyelins (SPM), 9 ceramides (CER), 56 phosphatidylcholines (PC), 15 lysophosphatidylcholines (LPC), 27 phosphatidylethanolamines (PE), and 19 PE-based plasmalogens (PLPE). From these data, additional traits were defined by aggregating species into groups with similar characteristics (e.g. unsaturated ceramides), and also by expressing data as molar percentages (instead of absolute concentrations) within classes. Following the standards accepted in genetic analysis of metabolomics data30, in this work, 328 such measurements served as a base to compute all pair-wise ratios, resulting in 107,584 traits, which were analyzed for association with 266,878 SNPs. More details about the data are provided in Supplementary Note S4.

Previously, it took several weeks to accomplish the original analysis of only few hundreds “original” traits. However, using our OmicABEL software and a computer with 40 cores, we were able to finish the analysis of more than 100,000 traits in only 8 hours.

Table S4 shows the results for five SNPs previously reported11 to be significantly associated with levels of circulating sphingolipid concentrations. The best results obtained for these SNPs when using the original and derived traits are reported. The implicated traits were also analyzed by using other approaches: the full likelihood ratio test based LMM18,22 (as implemented in MixABEL::FMM), Grammar-γ25 (as implemented in the GenABEL-package29), and the two-step approach1921 (as implemented in MixABEL::GWFGLS). From Table S4 one can see that our results are consistent with those obtained by other methods. Additionally, Table S3 summarizes the results concerning heritabilities of analyzed traits, while Table S5 lists the Geinflation factor λ obtained when analyzing the selected traits with different methods. More details of the analysis of this human metabolomics data set are provided in Supplementary Note S4.

Discussion

In contemporary human genomics, methods and tools face the additional challenge posed by the sheer size of the datasets. Big data are produced from the investigation of large human cohorts including hundreds of thousands of participants; these massive samples facilitate the identification of small effects and lead to important biological insights. Large data also come from the field of functional (gen)omics, which aims at establishing the functional roles of genetic variants; hence GWAS are increasingly applied not only to study complex traits in large cohorts, but also to understand the regulation of human and animal transcriptome15,31,32, metabolome1012, glyco(protein)ome6,7 and other types of omics data. Results of these studies are used to uncover the link between these molecular phenotypes and high-level complex traits, including human diseases1416.

In recent years, linear mixed model was accepted as a powerful tool for whole-genome analysis of genetic associations17,18. Most current LMM-based methods for GWAS1924 exhibit linear dependency of the compute time on the number of genetic polymorphisms and traits studied, but at least quadratic dependency on the sample size. A notable exception from the latter “at-least-quadratic” rule is the GRAMMAR-Gamma method25 and a method based on low rank approximation of the similarity matrix22 - with the latter exploiting the ideas similar to EIGENSTRAT approach33 and the methods assuming the adjustment of the model for top Principal Components of the kinship matrix variation. However, the computational advantage of these methods comes at the cost of mathematical approximation. For example, the GRAMMAR-Gamma method, while extremely fast, and showing excellent results for human studies, is less suited for analyses of samples with uneven genetic structure; adjusting for top principle components (and EIGENSTRAT) is known to provide incomplete correction for stratification in case of complex kinship. Increasing sample sizes and availability of molecular omics phenotypes lead to “big data challenges” and the computational throughput of LMM’s starts being more and more of an issue, which at present sometimes cannot be resolved without resorting to supercomputing facilities.

With this work, we address the problem of mixed-model based whole-genome analysis of genetic association for an arbitrary number of traits. We describe the CLAK-Chol and CLAK-Eig algorithms and software tools (OmicABEL package) to address LMM-based GWAS. Specifically, our CLAK-Chol will be useful for investigation of complex traits in very large (tens of thousands of individuals) samples, while CLAK-Eig will be a useful tool for the investigation of genetic control of different omics, potentially including hundreds of thousands or even millions of features.

As for our CLAK-Chol approach, we are not aware of similar, Cholesky-based solutions proposed before. The CLAK-Eig approach behind our solutions bear similarities, and actually reduces to previously suggested methods (e.g.22,28) when the number of traits is one. It is also worth mentioning the Matrix eQTL software34, which, while not implementing the LMM, in many respects exploits the problem-specific properties of multi-trait GWAS in the ways similar to ours.

The key achievement of this research is that it facilitates big-data LMM-based GWAS without supercomputers. For a sample problem with a population of 1,000, three covariates, one million SNPs and 100,000 traits, we estimate that the available methods would require the entire Sequoia supercomputer (equipped with 1.5 million cores)1 for about 3 minutes; by contrast, using a common 40-core compute node (see Supplementary Methods, Note S2), our method completes within a day and reduces the energy consumption by a factor of 200 (estimated). It should be noted that this impressive speed-up comes at the price of additional assumption of complete data. For many types of omics assays assumption of absence of missing data could be (almost) true, and a small proportion of missing data could be imputed (in the simplest case – replaced with average value) with little negative effect onto statistical properties of the method. However, for the omics assays which produce large proportion of missing data, our CLAK-Eig method in its current formulation and implementation would be inapplicable, unless the missing values could be reliably imputed.

In case of single-trait analysis, our results are somewhat less impressive, and our CLAK-Chol solution outperforms advanced current methods (e.g. FaST-LMM) by about one order of magnitude for large-sample-size problems. It is worth mentioning that the latter speed-up becomes possible because we show that for single-trait GWAS problems our CLAK-Chol algorithm is superior to CLAK-Eig, but other current methods are actually implementing solutions similar to our CLAK-Eig algorithm to address the single-trait GWAS problem.

Further optimizations of our solutions are possible, for instance by exploiting the structure of the kinship matrix. A “compressed MLM” approach was proposed for decreasing the effective sample size of datasets by clustering individuals into groups20; similarly, the fast decaying and possibly sparse structure of the kinship matrix can be exploited to lower the number of mathematical operations. Caution must be exercised in the interpretation of findings resulting from GWAS analyses as they may generate false positives if the multiple testing problem is not addressed adequately. A conservative strategy to determine whether an association is statistically significant would be to apply a Bonferronni correction, that is, in our example analysis of 107,144 traits, the conventional genome-wide significance threshold p-value of 5 · 10-8 should be replaced by 4.7 · 10-13. This is the common approach applied in the metabolomics studies (see10,30). On the other hand, this threshold would probably be too conservative, given that many of the measurements may be highly correlated. Several methods have been introduced recently3537, which may help to overcome this problem; however, this topic lays outside the scope of the current work.

For the analysis of specific omics data, our methods (and software) might require some modifications. For example, in the genetic regulation of human transcriptome, large attention is dedicated to cis-eQTLs, computationally a relatively simple task. In contrast, our implementation is tailored to perform full GWAS for every trait analyzed. While OmicABEL could be used for the identification of trans-eQTLs, one should be aware of the specifics of the analysis of this type of data (e.g. allele-specific expression in RNAseq studies) and a body of methods developed (e.g. methods to account for influences of hidden factors38,39).

We foresee that the primary use of our algorithms and software is within the domain of analysis of complex traits in very large samples and for the genetic analysis of “omics” data. However, potentially, there are other uses. The same set of methods and tools can be used for scanning through other omics in e.g. search for biomarkers for a complex trait or in order to determine functional relations between different omics. For example, one may be interested in doing epigenome-wide scans relating the epigenome to a complex trait (or other omics, such as metabolome). Under this use, the genomic inputs would be replaced by epigenomic data. Advanced statistical and machine learning methods, such as penalized regression, can make use of joint analysis of up to several hundreds of thousands of predictors40. One of the common scenarios include the construction of millions of features, and their filtering for further joint analysis—a task which can be also effectively addressed by our methods. Finally, our algorithms can be easily extended, e.g. to search for interactions.

Conclusions

We demonstrated that different computational algorithms are optimal for the problems of single- and multi-trait Mixed-Model based GWAS, and implemented these algorithms in a freely available OmicABEL software.

Software availability

Software access

The OmicABEL software implementing computational methods described in this manuscript is available as part of the GenABEL project for statistical genomics at http://www.genabel.org/packages/OmicABEL. The web-page provides the link to OmicABEL tutorial, giving examples of its use.

Archived source code as at the time of publication

http://dx.doi.org/10.5281/zenodo.1099941

Software license

GNU GPL v. 3

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 20 Aug 2014
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Fabregat-Traver D, Sharapov SZ, Hayward C et al. High-Performance Mixed Models Based Genome-Wide Association Analysis with omicABEL software [version 1; peer review: 2 approved, 1 approved with reservations] F1000Research 2014, 3:200 (https://doi.org/10.12688/f1000research.4867.1)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 20 Aug 2014
Views
24
Cite
Reviewer Report 11 Feb 2015
Bertram Muller-Myhsok, Max Planck Institute for Psychiatry, Munich, Germany 
Approved
VIEWS 24
This is a very well written report on a very important topic of very high impact in the actual of omics and multi-level omics data, allowing beyond sheer analysis (which in large data sets is very demanding indeed in terms ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Muller-Myhsok B. Reviewer Report For: High-Performance Mixed Models Based Genome-Wide Association Analysis with omicABEL software [version 1; peer review: 2 approved, 1 approved with reservations]. F1000Research 2014, 3:200 (https://doi.org/10.5256/f1000research.5195.r5915)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
45
Cite
Reviewer Report 05 Sep 2014
Peter Holmans, MRC Center for Neuropsychiatric Genetics and Genomics, Cardiff University, Cardiff, UK 
Approved
VIEWS 45
This article introduces a new software tool (omicABEL) for performing rapid association analyses between SNP data and high-dimensional "omics" data (e.g. gene expression) using mixed models. This is an extremely important development for the following reasons:
  1. Identification of genetic variants associated
... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Holmans P. Reviewer Report For: High-Performance Mixed Models Based Genome-Wide Association Analysis with omicABEL software [version 1; peer review: 2 approved, 1 approved with reservations]. F1000Research 2014, 3:200 (https://doi.org/10.5256/f1000research.5195.r5916)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
70
Cite
Reviewer Report 29 Aug 2014
Dirk Jan de Koning, Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, Uppsala, Sweden 
Approved with Reservations
VIEWS 70
The manuscript shows a computational approach that enABELs the use of linear mixed mixed models for large pedigrees, large numbers of markers and large numbers of (omics) traits.
 
While the work is interesting, several things need to be clarified before the ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Jan de Koning D. Reviewer Report For: High-Performance Mixed Models Based Genome-Wide Association Analysis with omicABEL software [version 1; peer review: 2 approved, 1 approved with reservations]. F1000Research 2014, 3:200 (https://doi.org/10.5256/f1000research.5195.r5981)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 20 Aug 2014
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.