Comparing gradient boosting machine and Bayesian threshold BLUP for genome-based prediction of categorical traits in wheat breeding

Genomic selection (GS) is a predictive methodology that is changing plant breeding. Genomic selection trains a statistical machine-learning model using available phenotypic and genotypic data with which predictions are performed for individuals that were only genotyped. For this reason, some statistical machine-learning methods are being implemented in GS, but in order to improve the selection of new genotypes early in the prediction process, the exploration of new statistical machine-learning algorithms must continue. In this paper, we performed a benchmarking study between the Bayesian threshold genomic best linear unbiased predictor model (TGBLUP; popular in GS) and the gradient boosting machine (GBM). This comparison was done using four real wheat (Triticum aestivum L.) data sets with categorical traits measured in terms of two metrics: the proportion of cases correctly classified (PCCC) and the Kappa coefficient in the testing set. Under 10 random partitions with four different sizes of testing proportions (20, 40, 60, and 80%), we compared the two algorithms and found that in three of the four data sets, the GBM outperformed the TGBLUP model in terms of both metrics (PCCC and Kappa coefficient). In the larger data sets (Data Sets 3 and 4), the gain in terms of prediction accuracy of the GBM was considerably significant. For this reason, we encourage more research using the GBM in GS to evaluate its virtues in terms of prediction performance in the context of GS.

Saved in:
Bibliographic Details
Main Authors: Montesinos-Lopez, O.A., Gonzalez, H.N., Montesinos-Lopez, A., Daza-Torres, M., Lillemo, M., Montesinos-Lopez, J.C., Crossa, J.
Format: Article biblioteca
Language:English
Published: Wiley 2022
Subjects:AGRICULTURAL SCIENCES AND BIOTECHNOLOGY, Best Linear Unbiased Prediction, MARKER-ASSISTED SELECTION, WHEAT, BREEDING, RESEARCH, BEST LINEAR UNBIASED PREDICTOR,
Online Access:https://hdl.handle.net/10883/22081
Tags: Add Tag
No Tags, Be the first to tag this record!
id dig-cimmyt-10883-22081
record_format koha
spelling dig-cimmyt-10883-220812023-11-15T15:01:50Z Comparing gradient boosting machine and Bayesian threshold BLUP for genome-based prediction of categorical traits in wheat breeding Montesinos-Lopez, O.A. Gonzalez, H.N. Montesinos-Lopez, A. Daza-Torres, M. Lillemo, M. Montesinos-Lopez, J.C. Crossa, J. AGRICULTURAL SCIENCES AND BIOTECHNOLOGY Best Linear Unbiased Prediction MARKER-ASSISTED SELECTION WHEAT BREEDING RESEARCH BEST LINEAR UNBIASED PREDICTOR Genomic selection (GS) is a predictive methodology that is changing plant breeding. Genomic selection trains a statistical machine-learning model using available phenotypic and genotypic data with which predictions are performed for individuals that were only genotyped. For this reason, some statistical machine-learning methods are being implemented in GS, but in order to improve the selection of new genotypes early in the prediction process, the exploration of new statistical machine-learning algorithms must continue. In this paper, we performed a benchmarking study between the Bayesian threshold genomic best linear unbiased predictor model (TGBLUP; popular in GS) and the gradient boosting machine (GBM). This comparison was done using four real wheat (Triticum aestivum L.) data sets with categorical traits measured in terms of two metrics: the proportion of cases correctly classified (PCCC) and the Kappa coefficient in the testing set. Under 10 random partitions with four different sizes of testing proportions (20, 40, 60, and 80%), we compared the two algorithms and found that in three of the four data sets, the GBM outperformed the TGBLUP model in terms of both metrics (PCCC and Kappa coefficient). In the larger data sets (Data Sets 3 and 4), the gain in terms of prediction accuracy of the GBM was considerably significant. For this reason, we encourage more research using the GBM in GS to evaluate its virtues in terms of prediction performance in the context of GS. 2022-05-24T00:10:13Z 2022-05-24T00:10:13Z 2022 Article Published Version https://hdl.handle.net/10883/22081 10.1002/tpg2.20214 English https://acsess.onlinelibrary.wiley.com/doi/10.1002/tpg2.20214#support-information-section https://hdl.handle.net/11529/10548140 CIMMYT manages Intellectual Assets as International Public Goods. The user is free to download, print, store and share this work. In case you want to translate or create any other derivative work and share or distribute such translation/derivative work, please contact CIMMYT-Knowledge-Center@cgiar.org indicating the work you want to use and the kind of use you intend; CIMMYT will contact you with the suitable license for that purpose Open Access Madison, WI (USA) Wiley 3 15 20214 1940-3372 Plant Genome
institution CIMMYT
collection DSpace
country México
countrycode MX
component Bibliográfico
access En linea
databasecode dig-cimmyt
tag biblioteca
region America del Norte
libraryname CIMMYT Library
language English
topic AGRICULTURAL SCIENCES AND BIOTECHNOLOGY
Best Linear Unbiased Prediction
MARKER-ASSISTED SELECTION
WHEAT
BREEDING
RESEARCH
BEST LINEAR UNBIASED PREDICTOR
AGRICULTURAL SCIENCES AND BIOTECHNOLOGY
Best Linear Unbiased Prediction
MARKER-ASSISTED SELECTION
WHEAT
BREEDING
RESEARCH
BEST LINEAR UNBIASED PREDICTOR
spellingShingle AGRICULTURAL SCIENCES AND BIOTECHNOLOGY
Best Linear Unbiased Prediction
MARKER-ASSISTED SELECTION
WHEAT
BREEDING
RESEARCH
BEST LINEAR UNBIASED PREDICTOR
AGRICULTURAL SCIENCES AND BIOTECHNOLOGY
Best Linear Unbiased Prediction
MARKER-ASSISTED SELECTION
WHEAT
BREEDING
RESEARCH
BEST LINEAR UNBIASED PREDICTOR
Montesinos-Lopez, O.A.
Gonzalez, H.N.
Montesinos-Lopez, A.
Daza-Torres, M.
Lillemo, M.
Montesinos-Lopez, J.C.
Crossa, J.
Comparing gradient boosting machine and Bayesian threshold BLUP for genome-based prediction of categorical traits in wheat breeding
description Genomic selection (GS) is a predictive methodology that is changing plant breeding. Genomic selection trains a statistical machine-learning model using available phenotypic and genotypic data with which predictions are performed for individuals that were only genotyped. For this reason, some statistical machine-learning methods are being implemented in GS, but in order to improve the selection of new genotypes early in the prediction process, the exploration of new statistical machine-learning algorithms must continue. In this paper, we performed a benchmarking study between the Bayesian threshold genomic best linear unbiased predictor model (TGBLUP; popular in GS) and the gradient boosting machine (GBM). This comparison was done using four real wheat (Triticum aestivum L.) data sets with categorical traits measured in terms of two metrics: the proportion of cases correctly classified (PCCC) and the Kappa coefficient in the testing set. Under 10 random partitions with four different sizes of testing proportions (20, 40, 60, and 80%), we compared the two algorithms and found that in three of the four data sets, the GBM outperformed the TGBLUP model in terms of both metrics (PCCC and Kappa coefficient). In the larger data sets (Data Sets 3 and 4), the gain in terms of prediction accuracy of the GBM was considerably significant. For this reason, we encourage more research using the GBM in GS to evaluate its virtues in terms of prediction performance in the context of GS.
format Article
topic_facet AGRICULTURAL SCIENCES AND BIOTECHNOLOGY
Best Linear Unbiased Prediction
MARKER-ASSISTED SELECTION
WHEAT
BREEDING
RESEARCH
BEST LINEAR UNBIASED PREDICTOR
author Montesinos-Lopez, O.A.
Gonzalez, H.N.
Montesinos-Lopez, A.
Daza-Torres, M.
Lillemo, M.
Montesinos-Lopez, J.C.
Crossa, J.
author_facet Montesinos-Lopez, O.A.
Gonzalez, H.N.
Montesinos-Lopez, A.
Daza-Torres, M.
Lillemo, M.
Montesinos-Lopez, J.C.
Crossa, J.
author_sort Montesinos-Lopez, O.A.
title Comparing gradient boosting machine and Bayesian threshold BLUP for genome-based prediction of categorical traits in wheat breeding
title_short Comparing gradient boosting machine and Bayesian threshold BLUP for genome-based prediction of categorical traits in wheat breeding
title_full Comparing gradient boosting machine and Bayesian threshold BLUP for genome-based prediction of categorical traits in wheat breeding
title_fullStr Comparing gradient boosting machine and Bayesian threshold BLUP for genome-based prediction of categorical traits in wheat breeding
title_full_unstemmed Comparing gradient boosting machine and Bayesian threshold BLUP for genome-based prediction of categorical traits in wheat breeding
title_sort comparing gradient boosting machine and bayesian threshold blup for genome-based prediction of categorical traits in wheat breeding
publisher Wiley
publishDate 2022
url https://hdl.handle.net/10883/22081
work_keys_str_mv AT montesinoslopezoa comparinggradientboostingmachineandbayesianthresholdblupforgenomebasedpredictionofcategoricaltraitsinwheatbreeding
AT gonzalezhn comparinggradientboostingmachineandbayesianthresholdblupforgenomebasedpredictionofcategoricaltraitsinwheatbreeding
AT montesinoslopeza comparinggradientboostingmachineandbayesianthresholdblupforgenomebasedpredictionofcategoricaltraitsinwheatbreeding
AT dazatorresm comparinggradientboostingmachineandbayesianthresholdblupforgenomebasedpredictionofcategoricaltraitsinwheatbreeding
AT lillemom comparinggradientboostingmachineandbayesianthresholdblupforgenomebasedpredictionofcategoricaltraitsinwheatbreeding
AT montesinoslopezjc comparinggradientboostingmachineandbayesianthresholdblupforgenomebasedpredictionofcategoricaltraitsinwheatbreeding
AT crossaj comparinggradientboostingmachineandbayesianthresholdblupforgenomebasedpredictionofcategoricaltraitsinwheatbreeding
_version_ 1787233006961819648