< February 2021 >
MonTueWedThuFriSatSun
01020304050607
08091011121314
15161718192021
22232425262728

Saturday, 06 February 2021

09:21 PM

A brief introduction to my artificial intelligence and machine learning research program - YouTube [Epistasis Blog] 09:21 PM, Saturday, 06 February 2021 10:00 PM, Saturday, 06 February 2021

 A 12-minute overview of my artificial intelligence and machine learning research program [YouTube]

09:19 PM

Ten simple rules for writing a paper about scientific software [Epistasis Blog] 09:19 PM, Saturday, 06 February 2021 09:20 PM, Saturday, 06 February 2021

Romano JD, Moore JH. Ten simple rules for writing a paper about scientific software. PLoS Comput Biol. 2020 Nov 12;16(11):e1008390. doi: 10.1371/journal.pcbi.1008390. PMID: 33180774; PMCID: PMC7660560. [PubMed] [PLoS Comp Bio]

Abstract

Papers describing software are an important part of computational fields of scientific research. These "software papers" are unique in a number of ways, and they require special consideration to improve their impact on the scientific community and their efficacy at conveying important information. Here, we discuss 10 specific rules for writing software papers, covering some of the different scenarios and publication types that might be encountered, and important questions from which all computational researchers would benefit by asking along the way.

09:16 PM

Embedding covariate adjustments in tree-based automated machine learning for biomedical big data analyses [Epistasis Blog] 09:16 PM, Saturday, 06 February 2021 09:20 PM, Saturday, 06 February 2021

Manduchi E, Fu W, Romano JD, Ruberto S, Moore JH. Embedding covariate adjustments in tree-based automated machine learning for biomedical big data analyses. BMC Bioinformatics. 2020 Oct 1;21(1):430. doi: 10.1186/s12859-020-03755-4. PMID: 32998684; PMCID: PMC7528347. [PubMed] [BMC Bioinformatics]

Abstract

Background: A typical task in bioinformatics consists of identifying which features are associated with a target outcome of interest and building a predictive model. Automated machine learning (AutoML) systems such as the Tree-based Pipeline Optimization Tool (TPOT) constitute an appealing approach to this end. However, in biomedical data, there are often baseline characteristics of the subjects in a study or batch effects that need to be adjusted for in order to better isolate the effects of the features of interest on the target. Thus, the ability to perform covariate adjustments becomes particularly important for applications of AutoML to biomedical big data analysis.

Results: We developed an approach to adjust for covariates affecting features and/or target in TPOT. Our approach is based on regressing out the covariates in a manner that avoids 'leakage' during the cross-validation training procedure. We describe applications of this approach to toxicogenomics and schizophrenia gene expression data sets. The TPOT extensions discussed in this work are available at https://github.com/EpistasisLab/tpot/tree/v0.11.1-resAdj
.

Conclusions: In this work, we address an important need in the context of AutoML, which is particularly crucial for applications to bioinformatics and medical informatics, namely covariate adjustments. To this end we present a substantial extension of TPOT, a genetic programming based AutoML approach. We show the utility of this extension by applications to large toxicogenomics and differential gene expression data. The method is generally applicable in many other scenarios from the biomedical field.

09:14 PM

Ten important roles for academic leaders in data science [Epistasis Blog] 09:14 PM, Saturday, 06 February 2021 09:20 PM, Saturday, 06 February 2021

Moore JH. Ten important roles for academic leaders in data science. BioData Min. 2020 Oct 26;13:18. doi: 10.1186/s13040-020-00228-5. PMID: 33117434; PMCID: PMC7586691. [PubMed] [BioData Mining]

Abstract

Data science has emerged as an important discipline in the era of big data and biological and biomedical data mining. As such, we have seen a rapid increase in the number of data science departments, research centers, and schools. We review here ten important leadership roles for a successful academic data science chair, director, or dean. These roles include the visionary, executive, cheerleader, manager, enforcer, subordinate, educator, entrepreneur, mentor, and communicator. Examples specific to leadership in data science are given for each role.

09:11 PM

Evaluating recommender systems for AI-driven biomedical informatics [Epistasis Blog] 09:11 PM, Saturday, 06 February 2021 09:20 PM, Saturday, 06 February 2021

La Cava W, Williams H, Fu W, Vitale S, Srivatsan D, Moore JH. Evaluating recommender systems for AI-driven biomedical informatics. Bioinformatics. 2020 Aug 7:btaa698. doi: 10.1093/bioinformatics/btaa698. Epub ahead of print. PMID: 32766825. [PubMed] [Bioinformatics]

Abstract

Motivation: Many researchers with domain expertise are unable to easily apply machine learning to their bioinformatics data due to a lack of machine learning and/or coding expertise. Methods that have been proposed thus far to automate machine learning mostly require programming experience as well as expert knowledge to tune and apply the algorithms correctly. Here, we study a method of automating biomedical data science using a web-based platform that uses AI to recommend model choices and conduct experiments. We have two goals in mind: first, to make it easy to construct sophisticated models of biomedical processes; and second, to provide a fully automated AI agent that can choose and conduct promising experiments for the user, based on the user's experiments as well as prior knowledge. To validate this framework, we experiment with hundreds of classification problems, comparing to state-of-the-art, automated approaches. Finally, we use this tool to develop predictive models of septic shock in critical care patients.

Results: We find that matrix factorization-based recommendation systems outperform meta-learning methods for automating machine learning. This result mirrors the results of earlier recommender systems research in other domains. The proposed AI is competitive with state-of-the-art automated machine learning methods in terms of choosing optimal algorithm configurations for datasets. In our application to prediction of septic shock, the AI-driven analysis produces a competent machine learning model (AUROC 0.85 +/- 0.02) that performs on par with state-of-the-art deep learning results for this task, with much less computational effort.

Availability: PennAI is available free of charge and open-source. It is distributed under the GNU public license (GPL) version 3.

Supplementary information: Software and experiments are available from epistasislab.github.io/pennai.

09:07 PM

treeheatr: an R package for interpretable decision tree visualizations [Epistasis Blog] 09:07 PM, Saturday, 06 February 2021 09:20 PM, Saturday, 06 February 2021

Le TT, Moore JH. treeheatr: an R package for interpretable decision tree visualizations. Bioinformatics. 2020 Jul 23:btaa662. doi: 10.1093/bioinformatics/btaa662. Epub ahead of print. PMID: 32702108. [PubMed] [Bioinformatics]
Abstract
Summary: treeheatr is an R package for creating interpretable decision tree visualizations with the data represented as a heatmap at the tree's leaf nodes. The integrated presentation of the tree structure along with an overview of the data efficiently illustrates how the tree nodes split up the feature space and how well the tree model performs. This visualization can also be examined in depth to uncover the correlation structure in the data and importance of each feature in predicting the outcome. Implemented in an easily installed package with a detailed vignette, treeheatr can be a useful teaching tool to enhance students' understanding of a simple decision tree model before diving into more complex tree-based machine learning methods.
Availability: The treeheatr package is freely available under the permissive MIT license at https://trang1618.github.io/treeheatr and https://cran.r-project.org/package=treeheatr. It comes with a detailed vignette that is automatically built with GitHub Actions continuous integration.
Supplementary information: Supplementary data are available at Bioinformatics online.

09:07 PM

Electronic health records and polygenic risk scores for predicting disease risk [Epistasis Blog] 09:07 PM, Saturday, 06 February 2021 09:20 PM, Saturday, 06 February 2021

Li R, Chen Y, Ritchie MD, Moore JH. Electronic health records and polygenic risk scores for predicting disease risk. Nat Rev Genet. 2020 Aug;21(8):493-502. doi: 10.1038/s41576-020-0224-1. Epub 2020 Mar 31. PMID: 32235907. [PubMed] [Nature Reviews]

Abstract

Accurate prediction of disease risk based on the genetic make-up of an individual is essential for effective prevention and personalized treatment. Nevertheless, to date, individual genetic variants from genome-wide association studies have achieved only moderate prediction of disease risk. The aggregation of genetic variants under a polygenic model shows promising improvements in prediction accuracies. Increasingly, electronic health records (EHRs) are being linked to patient genetic data in biobanks, which provides new opportunities for developing and applying polygenic risk scores in the clinic, to systematically examine and evaluate patient susceptibilities to disease. However, the heterogeneous nature of EHR data brings forth many practical challenges along every step of designing and implementing risk prediction strategies. In this Review, we present the unique considerations for using genotype and phenotype data from biobank-linked EHRs for polygenic risk prediction.

09:04 PM

Ideas for how informaticians can get involved with COVID-19 research [Epistasis Blog] 09:04 PM, Saturday, 06 February 2021 09:20 PM, Saturday, 06 February 2021

Moore JH, Barnett I, Boland MR, Chen Y, Demiris G, Gonzalez-Hernandez G, Herman DS, Himes BE, Hubbard RA, Kim D, Morris JS, Mowery DL, Ritchie MD, Shen L, Urbanowicz R, Holmes JH. Ideas for how informaticians can get involved with COVID-19 research. BioData Min. 2020 May 12;13:3. doi: 10.1186/s13040-020-00213-y. PMID: 32419848; PMCID: PMC7216865. [PubMed] [BioData Mining]

Abstract

The coronavirus disease 2019 (COVID-19) pandemic has had a significant impact on population health and wellbeing. Biomedical informatics is central to COVID-19 research efforts and for the delivery of healthcare for COVID-19 patients. Critical to this effort is the participation of informaticians who typically work on other basic science or clinical problems. The goal of this editorial is to highlight some examples of COVID-19 research areas that could benefit from informatics expertise. Each research idea summarizes the COVID-19 application area, followed by an informatics methodology, approach, or technology that could make a contribution. It is our hope that this piece will motivate and make it easy for some informaticians to adopt COVID-19 research projects.

08:57 PM

Using simulations to understand the relationship between epistasis and observed GWAS findings [Epistasis Blog] 08:57 PM, Saturday, 06 February 2021 09:20 PM, Saturday, 06 February 2021

Moore JH, Olson RS, Schmitt P, Chen Y, Manduchi E. How Computational Experiments Can Improve Our Understanding of the Genetic Architecture of Common Human Diseases. Artif Life. 2020 Winter;26(1):23-37. doi: 10.1162/artl_a_00308. Epub 2020 Feb 6. PMID: 32027528. [PubMed] [Artificial Life]

Abstract

Susceptibility to common human diseases such as cancer is influenced by many genetic and environmental factors that work together in a complex manner. The state of the art is to perform a genome-wide association study (GWAS) that measures millions of single-nucleotide polymorphisms (SNPs) throughout the genome followed by a one-SNP-at-a-time statistical analysis to detect univariate associations. This approach has identified thousands of genetic risk factors for hundreds of diseases. However, the genetic risk factors detected have very small effect sizes and collectively explain very little of the overall heritability of the disease. Nonetheless, it is assumed that the genetic component of risk is due to many independent risk factors that contribute additively. The fact that many genetic risk factors with small effects can be detected is taken as evidence to support this notion. It is our working hypothesis that the genetic architecture of common diseases is partly driven by non-additive interactions. To test this hypothesis, we developed a heuristic simulation-based method for conducting experiments about the complexity of genetic architecture. We show that a genetic architecture driven by complex interactions is highly consistent with the magnitude and distribution of univariate effects seen in real data. We compare our results with measures of univariate and interaction effects from two large-scale GWASs of sporadic breast cancer and find evidence to support our hypothesis that is consistent with the results of our computational experiment.

Feeds

FeedRSSLast fetchedNext fetched after
XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
Bits of DNA XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
blogs.perl.org XML 12:00 AM, Tuesday, 18 January 2022 12:15 AM, Tuesday, 18 January 2022
Blue Collar Bioinformatics XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
Boing Boing XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
Epistasis Blog XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
Futility Closet XML 12:00 AM, Tuesday, 18 January 2022 12:15 AM, Tuesday, 18 January 2022
gCaptain XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
Hackaday XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
In between lines of code XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
InciWeb Incidents for California XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
LeafSpring XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
Living in an Ivory Basement XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
LWN.net XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
Mastering Emacs XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
Planet Debian XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
Planet Emacsen XML 12:00 AM, Tuesday, 18 January 2022 12:15 AM, Tuesday, 18 January 2022
RNA-Seq Blog XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
RStudio Blog - Latest Comments XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
RWeekly.org - Blogs to Learn R from the Community XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
The Adventure Blog XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
The Allium XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
Variance Explained XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
January 2022
MonTueWedThuFriSatSun
27282930310102
03040506070809
10111213141516
17181920212223
24252627282930
31010203040506
December 2021
MonTueWedThuFriSatSun
29300102030405
06070809101112
13141516171819
20212223242526
27282930310102
November 2021
MonTueWedThuFriSatSun
01020304050607
08091011121314
15161718192021
22232425262728
29300102030405
October 2021
MonTueWedThuFriSatSun
27282930010203
04050607080910
11121314151617
18192021222324
25262728293031
September 2021
MonTueWedThuFriSatSun
30310102030405
06070809101112
13141516171819
20212223242526
27282930010203
August 2021
MonTueWedThuFriSatSun
26272829303101
02030405060708
09101112131415
16171819202122
23242526272829
30310102030405
July 2021
MonTueWedThuFriSatSun
28293001020304
05060708091011
12131415161718
19202122232425
26272829303101
June 2021
MonTueWedThuFriSatSun
31010203040506
07080910111213
14151617181920
21222324252627
28293001020304
May 2021
MonTueWedThuFriSatSun
26272829300102
03040506070809
10111213141516
17181920212223
24252627282930
31010203040506
April 2021
MonTueWedThuFriSatSun
29303101020304
05060708091011
12131415161718
19202122232425
26272829300102
March 2021
MonTueWedThuFriSatSun
01020304050607
08091011121314
15161718192021
22232425262728
29303101020304
February 2021
MonTueWedThuFriSatSun
01020304050607
08091011121314
15161718192021
22232425262728
November 2020
MonTueWedThuFriSatSun
26272829303101
02030405060708
09101112131415
16171819202122
23242526272829
30010203040506
September 2020
MonTueWedThuFriSatSun
31010203040506
07080910111213
14151617181920
21222324252627
28293001020304
July 2020
MonTueWedThuFriSatSun
29300102030405
06070809101112
13141516171819
20212223242526
27282930310102
June 2020
MonTueWedThuFriSatSun
01020304050607
08091011121314
15161718192021
22232425262728
29300102030405
May 2020
MonTueWedThuFriSatSun
27282930010203
04050607080910
11121314151617
18192021222324
25262728293031
April 2020
MonTueWedThuFriSatSun
30310102030405
06070809101112
13141516171819
20212223242526
27282930010203
February 2020
MonTueWedThuFriSatSun
27282930310102
03040506070809
10111213141516
17181920212223
24252627282901
January 2020
MonTueWedThuFriSatSun
30310102030405
06070809101112
13141516171819
20212223242526
27282930310102
December 2019
MonTueWedThuFriSatSun
25262728293001
02030405060708
09101112131415
16171819202122
23242526272829
30310102030405
November 2019
MonTueWedThuFriSatSun
28293031010203
04050607080910
11121314151617
18192021222324
25262728293001
October 2019
MonTueWedThuFriSatSun
30010203040506
07080910111213
14151617181920
21222324252627
28293031010203
August 2019
MonTueWedThuFriSatSun
29303101020304
05060708091011
12131415161718
19202122232425
26272829303101
July 2019
MonTueWedThuFriSatSun
01020304050607
08091011121314
15161718192021
22232425262728
29303101020304
June 2019
MonTueWedThuFriSatSun
27282930310102
03040506070809
10111213141516
17181920212223
24252627282930
May 2019
MonTueWedThuFriSatSun
29300102030405
06070809101112
13141516171819
20212223242526
27282930310102
April 2019
MonTueWedThuFriSatSun
01020304050607
08091011121314
15161718192021
22232425262728
29300102030405
March 2019
MonTueWedThuFriSatSun
25262728010203
04050607080910
11121314151617
18192021222324
25262728293031
February 2019
MonTueWedThuFriSatSun
28293031010203
04050607080910
11121314151617
18192021222324
25262728010203
January 2019
MonTueWedThuFriSatSun
31010203040506
07080910111213
14151617181920
21222324252627
28293031010203
December 2018
MonTueWedThuFriSatSun
26272829300102
03040506070809
10111213141516
17181920212223
24252627282930
31010203040506
November 2018
MonTueWedThuFriSatSun
29303101020304
05060708091011
12131415161718
19202122232425
26272829300102
October 2018
MonTueWedThuFriSatSun
01020304050607
08091011121314
15161718192021
22232425262728
29303101020304
September 2018
MonTueWedThuFriSatSun
27282930310102
03040506070809
10111213141516
17181920212223
24252627282930
August 2018
MonTueWedThuFriSatSun
30310102030405
06070809101112
13141516171819
20212223242526
27282930310102
July 2018
MonTueWedThuFriSatSun
25262728293001
02030405060708
09101112131415
16171819202122
23242526272829
30310102030405
June 2018
MonTueWedThuFriSatSun
28293031010203
04050607080910
11121314151617
18192021222324
25262728293001
May 2018
MonTueWedThuFriSatSun
30010203040506
07080910111213
14151617181920
21222324252627
28293031010203
April 2018
MonTueWedThuFriSatSun
26272829303101
02030405060708
09101112131415
16171819202122
23242526272829
30010203040506
February 2018
MonTueWedThuFriSatSun
29303101020304
05060708091011
12131415161718
19202122232425
26272801020304
January 2018
MonTueWedThuFriSatSun
01020304050607
08091011121314
15161718192021
22232425262728
29303101020304
December 2017
MonTueWedThuFriSatSun
27282930010203
04050607080910
11121314151617
18192021222324
25262728293031
November 2017
MonTueWedThuFriSatSun
30310102030405
06070809101112
13141516171819
20212223242526
27282930010203
September 2017
MonTueWedThuFriSatSun
28293031010203
04050607080910
11121314151617
18192021222324
25262728293001
August 2017
MonTueWedThuFriSatSun
31010203040506
07080910111213
14151617181920
21222324252627
28293031010203
March 2017
MonTueWedThuFriSatSun
27280102030405
06070809101112
13141516171819
20212223242526
27282930310102
January 2017
MonTueWedThuFriSatSun
26272829303101
02030405060708
09101112131415
16171819202122
23242526272829
30310102030405
November 2016
MonTueWedThuFriSatSun
31010203040506
07080910111213
14151617181920
21222324252627
28293001020304
October 2016
MonTueWedThuFriSatSun
26272829300102
03040506070809
10111213141516
17181920212223
24252627282930
31010203040506
September 2016
MonTueWedThuFriSatSun
29303101020304
05060708091011
12131415161718
19202122232425
26272829300102
August 2016
MonTueWedThuFriSatSun
01020304050607
08091011121314
15161718192021
22232425262728
29303101020304
July 2016
MonTueWedThuFriSatSun
27282930010203
04050607080910
11121314151617
18192021222324
25262728293031
May 2016
MonTueWedThuFriSatSun
25262728293001
02030405060708
09101112131415
16171819202122
23242526272829
30310102030405
April 2016
MonTueWedThuFriSatSun
28293031010203
04050607080910
11121314151617
18192021222324
25262728293001
December 2014
MonTueWedThuFriSatSun
01020304050607
08091011121314
15161718192021
22232425262728
29303101020304
October 2014
MonTueWedThuFriSatSun
29300102030405
06070809101112
13141516171819
20212223242526
27282930310102