Compositionality, sparsity, spurious heterogeneity, and other data-driven challenges for machine learning algorithms within plant microbiome studies.

Researchers

Cranos Williams Ib Jensen Max Gordon Meenal Chaudhari Sebastiano Busato Stig Andersen Turgut Akyol

Journal

Modalities

Models

Abstract

The plant-associated microbiome is a key component of plant systems, contributing to their health, growth, and productivity. The application of machine learning (ML) in this field promises to help untangle the relationships involved. However, measurements of microbial communities by high-throughput sequencing pose challenges for ML. Noise from low sample sizes, soil heterogeneity, and technical factors can impact the performance of ML. Additionally, the compositional and sparse nature of these datasets can impact the predictive accuracy of ML. We review recent literature from plant studies to illustrate that these properties often go unmentioned. We expand our analysis to other fields to quantify the degree to which mitigation approaches improve the performance of ML and describe the mathematical basis for this. With the advent of accessible analytical packages for microbiome data including learning models, researchers must be familiar with the nature of their datasets.Copyright © 2022 Elsevier Ltd. All rights reserved.

Show Full Text

Compositionality, sparsity, spurious heterogeneity, and other data-driven challenges for machine learning algorithms within plant microbiome studies.

Researchers

Journal

Modalities

Models

Abstract

Splice Junction Identification using Long Short-Term Memory Neural Networks.

Direct Inference of Base-Pairing Probabilities with Neural Networks Improves Prediction of RNA Secondary Structures with Pseudoknots.

Syntactic- and morphology-based text augmentation framework for Arabic sentiment analysis.

An explainable AI-driven biomarker discovery framework for Non-Small Cell Lung Cancer classification.

Improving Cancer Survival Prediction via Graph Convolutional Neural Network Learning on Protein-Protein Interaction Networks.

Data-driven prediction of αβ integrin activation pathways using nonlinear manifold learning and deep generative modeling.

Leave a Reply Cancel reply

Researchers

Journal

Modalities

Models

Abstract

Similar Posts

Leave a Reply Cancel reply