Multi-trait genome prediction of new environments with partial least squares

The genomic selection (GS) methodology proposed over 20 years ago by Meuwissen et al. (Genetics, 2001) has revolutionized plant breeding. A predictive methodology that trains statistical machine learning algorithms with phenotypic and genotypic data of a reference population and makes predictions for genotyped candidate lines, GS saves significant resources in the selection of candidate individuals. However, its practical implementation is still challenging when the plant breeder is interested in the prediction of future seasons or new locations and/or environments, which is called the “leave one environment out” issue. Furthermore, because the distributions of the training and testing set do not match, most statistical machine learning methods struggle to produce moderate or reasonable prediction accuracies. For this reason, the main objective of this study was to explore the use of the multi-trait partial least square (MT-PLS) regression methodology for this specific task, benchmarking its performance with the Bayesian Multi-trait Genomic Best Linear Unbiased Predictor (MT-GBLUP) method. The benchmarking process was performed with five actual data sets. We found that in all data sets the MT-PLS method outperformed the popular MT-GBLUP method by 349.8% (under predictor E + G), 484.4% (under predictor E + G + GE; where E denotes environments, G genotypes and GE the genotype by environment interaction) and 15.9% (under predictor G + GE) across traits. Our results provide empirical evidence of the power of the MT-PLS methodology for the prediction of future seasons or new environments. Furthermore, the comparison between single univariate-trait (UT) versus MT for GBLUP and PLS gave an increase in prediction accuracy of MT-GBLUP versus UT-GBLUP, but not for MT-PLS versus UT-PLS.

Saved in:
Bibliographic Details
Main Authors: Montesinos-Lopez, O.A., Montesinos-Lopez, A., Bernal Sandoval, D.A., Mosqueda-Gonzalez, B.A., Valenzo-Jiménez, M.A., Crossa, J.
Format: Article biblioteca
Language:English
Published: Frontiers 2022
Subjects:AGRICULTURAL SCIENCES AND BIOTECHNOLOGY, Genomic Prediction, Multi-Trait Partial Least Squares, Single-Trait Partial Least Squares, Prediction of One Complete Environment, GENOTYPES, GENOTYPE ENVIRONMENT INTERACTION, MACHINE LEARNING, FORECASTING, MARKER-ASSISTED SELECTION,
Online Access:https://hdl.handle.net/10883/22290
Tags: Add Tag
No Tags, Be the first to tag this record!
id dig-cimmyt-10883-22290
record_format koha
spelling dig-cimmyt-10883-222902023-11-15T14:57:59Z Multi-trait genome prediction of new environments with partial least squares Montesinos-Lopez, O.A. Montesinos-Lopez, A. Bernal Sandoval, D.A. Mosqueda-Gonzalez, B.A. Valenzo-Jiménez, M.A. Crossa, J. AGRICULTURAL SCIENCES AND BIOTECHNOLOGY Genomic Prediction Multi-Trait Partial Least Squares Single-Trait Partial Least Squares Prediction of One Complete Environment GENOTYPES GENOTYPE ENVIRONMENT INTERACTION MACHINE LEARNING FORECASTING MARKER-ASSISTED SELECTION The genomic selection (GS) methodology proposed over 20 years ago by Meuwissen et al. (Genetics, 2001) has revolutionized plant breeding. A predictive methodology that trains statistical machine learning algorithms with phenotypic and genotypic data of a reference population and makes predictions for genotyped candidate lines, GS saves significant resources in the selection of candidate individuals. However, its practical implementation is still challenging when the plant breeder is interested in the prediction of future seasons or new locations and/or environments, which is called the “leave one environment out” issue. Furthermore, because the distributions of the training and testing set do not match, most statistical machine learning methods struggle to produce moderate or reasonable prediction accuracies. For this reason, the main objective of this study was to explore the use of the multi-trait partial least square (MT-PLS) regression methodology for this specific task, benchmarking its performance with the Bayesian Multi-trait Genomic Best Linear Unbiased Predictor (MT-GBLUP) method. The benchmarking process was performed with five actual data sets. We found that in all data sets the MT-PLS method outperformed the popular MT-GBLUP method by 349.8% (under predictor E + G), 484.4% (under predictor E + G + GE; where E denotes environments, G genotypes and GE the genotype by environment interaction) and 15.9% (under predictor G + GE) across traits. Our results provide empirical evidence of the power of the MT-PLS methodology for the prediction of future seasons or new environments. Furthermore, the comparison between single univariate-trait (UT) versus MT for GBLUP and PLS gave an increase in prediction accuracy of MT-GBLUP versus UT-GBLUP, but not for MT-PLS versus UT-PLS. 2022-12-07T20:29:46Z 2022-12-07T20:29:46Z 2022 Article Published Version https://hdl.handle.net/10883/22290 10.3389/fgene.2022.966775 English https://hdl.handle.net/11529/10548705 https://figshare.com/collections/Multi-trait_genome_prediction_of_new_environments_with_partial_least_squares/6181645 CIMMYT manages Intellectual Assets as International Public Goods. The user is free to download, print, store and share this work. In case you want to translate or create any other derivative work and share or distribute such translation/derivative work, please contact CIMMYT-Knowledge-Center@cgiar.org indicating the work you want to use and the kind of use you intend; CIMMYT will contact you with the suitable license for that purpose Open Access Switzerland Frontiers 13 1664-8021 Frontiers in Genetics 966775
institution CIMMYT
collection DSpace
country México
countrycode MX
component Bibliográfico
access En linea
databasecode dig-cimmyt
tag biblioteca
region America del Norte
libraryname CIMMYT Library
language English
topic AGRICULTURAL SCIENCES AND BIOTECHNOLOGY
Genomic Prediction
Multi-Trait Partial Least Squares
Single-Trait Partial Least Squares
Prediction of One Complete Environment
GENOTYPES
GENOTYPE ENVIRONMENT INTERACTION
MACHINE LEARNING
FORECASTING
MARKER-ASSISTED SELECTION
AGRICULTURAL SCIENCES AND BIOTECHNOLOGY
Genomic Prediction
Multi-Trait Partial Least Squares
Single-Trait Partial Least Squares
Prediction of One Complete Environment
GENOTYPES
GENOTYPE ENVIRONMENT INTERACTION
MACHINE LEARNING
FORECASTING
MARKER-ASSISTED SELECTION
spellingShingle AGRICULTURAL SCIENCES AND BIOTECHNOLOGY
Genomic Prediction
Multi-Trait Partial Least Squares
Single-Trait Partial Least Squares
Prediction of One Complete Environment
GENOTYPES
GENOTYPE ENVIRONMENT INTERACTION
MACHINE LEARNING
FORECASTING
MARKER-ASSISTED SELECTION
AGRICULTURAL SCIENCES AND BIOTECHNOLOGY
Genomic Prediction
Multi-Trait Partial Least Squares
Single-Trait Partial Least Squares
Prediction of One Complete Environment
GENOTYPES
GENOTYPE ENVIRONMENT INTERACTION
MACHINE LEARNING
FORECASTING
MARKER-ASSISTED SELECTION
Montesinos-Lopez, O.A.
Montesinos-Lopez, A.
Bernal Sandoval, D.A.
Mosqueda-Gonzalez, B.A.
Valenzo-Jiménez, M.A.
Crossa, J.
Multi-trait genome prediction of new environments with partial least squares
description The genomic selection (GS) methodology proposed over 20 years ago by Meuwissen et al. (Genetics, 2001) has revolutionized plant breeding. A predictive methodology that trains statistical machine learning algorithms with phenotypic and genotypic data of a reference population and makes predictions for genotyped candidate lines, GS saves significant resources in the selection of candidate individuals. However, its practical implementation is still challenging when the plant breeder is interested in the prediction of future seasons or new locations and/or environments, which is called the “leave one environment out” issue. Furthermore, because the distributions of the training and testing set do not match, most statistical machine learning methods struggle to produce moderate or reasonable prediction accuracies. For this reason, the main objective of this study was to explore the use of the multi-trait partial least square (MT-PLS) regression methodology for this specific task, benchmarking its performance with the Bayesian Multi-trait Genomic Best Linear Unbiased Predictor (MT-GBLUP) method. The benchmarking process was performed with five actual data sets. We found that in all data sets the MT-PLS method outperformed the popular MT-GBLUP method by 349.8% (under predictor E + G), 484.4% (under predictor E + G + GE; where E denotes environments, G genotypes and GE the genotype by environment interaction) and 15.9% (under predictor G + GE) across traits. Our results provide empirical evidence of the power of the MT-PLS methodology for the prediction of future seasons or new environments. Furthermore, the comparison between single univariate-trait (UT) versus MT for GBLUP and PLS gave an increase in prediction accuracy of MT-GBLUP versus UT-GBLUP, but not for MT-PLS versus UT-PLS.
format Article
topic_facet AGRICULTURAL SCIENCES AND BIOTECHNOLOGY
Genomic Prediction
Multi-Trait Partial Least Squares
Single-Trait Partial Least Squares
Prediction of One Complete Environment
GENOTYPES
GENOTYPE ENVIRONMENT INTERACTION
MACHINE LEARNING
FORECASTING
MARKER-ASSISTED SELECTION
author Montesinos-Lopez, O.A.
Montesinos-Lopez, A.
Bernal Sandoval, D.A.
Mosqueda-Gonzalez, B.A.
Valenzo-Jiménez, M.A.
Crossa, J.
author_facet Montesinos-Lopez, O.A.
Montesinos-Lopez, A.
Bernal Sandoval, D.A.
Mosqueda-Gonzalez, B.A.
Valenzo-Jiménez, M.A.
Crossa, J.
author_sort Montesinos-Lopez, O.A.
title Multi-trait genome prediction of new environments with partial least squares
title_short Multi-trait genome prediction of new environments with partial least squares
title_full Multi-trait genome prediction of new environments with partial least squares
title_fullStr Multi-trait genome prediction of new environments with partial least squares
title_full_unstemmed Multi-trait genome prediction of new environments with partial least squares
title_sort multi-trait genome prediction of new environments with partial least squares
publisher Frontiers
publishDate 2022
url https://hdl.handle.net/10883/22290
work_keys_str_mv AT montesinoslopezoa multitraitgenomepredictionofnewenvironmentswithpartialleastsquares
AT montesinoslopeza multitraitgenomepredictionofnewenvironmentswithpartialleastsquares
AT bernalsandovalda multitraitgenomepredictionofnewenvironmentswithpartialleastsquares
AT mosquedagonzalezba multitraitgenomepredictionofnewenvironmentswithpartialleastsquares
AT valenzojimenezma multitraitgenomepredictionofnewenvironmentswithpartialleastsquares
AT crossaj multitraitgenomepredictionofnewenvironmentswithpartialleastsquares
_version_ 1787233013377007616