An informational view of accession rarity and allele specificity in germplasm banks for management and conservation

Germplasm banks are growing in their importance, number of accessions and amount of characterization data, with a large emphasis on molecular genetic markers. In this work, we offer an integrated view of accessions and marker data in an information theory framework. The basis of this development is the mutual information between accessions and allele frequencies for molecular marker loci, which can be decomposed in allele specificities, as well as in rarity and divergence of accessions. In this way, formulas are provided to calculate the specificity of the different marker alleles with reference to their distribution across accessions, accession rarity, defined as the weighted average of the specificity of its alleles, and divergence, defined by the Kullback-Leibler formula. Albeit being different measures, it is demonstrated that average rarity and divergence are equal for any collection. These parameters can contribute to the knowledge of the structure of a germplasm collection and to make decisions about the preservation of rare variants. The concepts herein developed served as the basis for a strategy for core subset selection called HCore, implemented in a publicly available R script. As a proof of concept, the mathematical view and tools developed in this research were applied to a large collection of Mexican wheat accessions, widely characterized by SNP markers. The most specific alleles were found to be private of a single accession, and the distribution of this parameter had its highest frequencies at low levels of specificity. Accession rarity and divergence had largely symmetrical distributions, and had a positive, albeit non-strictly linear relationship. Comparison of the HCore approach for core subset selection, with three state-of-the-art methods, showed it to be superior for average divergence and rarity, mean genetic distance and diversity. The proposed approach can be used for knowledge extraction and decision making in germplasm collections of diploid, inbred or outbred species.

Saved in:
Bibliographic Details
Main Authors: Reyes-Valdés, M.H., Burgueño, J., Sukhwinder-Singh, Martínez-de la Vega, O., Sansaloni, C.P.
Format: Article biblioteca
Language:English
Published: Public Library of Science 2018
Subjects:AGRICULTURAL SCIENCES AND BIOTECHNOLOGY, WHEAT, CLIMATE CHANGE, ALLELES, GERMPLASM BANKS, PLANT BREEDING, PLANT GENETICS, GERMPLASM CONSERVATION,
Online Access:https://hdl.handle.net/10883/19951
Tags: Add Tag
No Tags, Be the first to tag this record!
id dig-cimmyt-10883-19951
record_format koha
spelling dig-cimmyt-10883-199512023-11-16T16:47:58Z An informational view of accession rarity and allele specificity in germplasm banks for management and conservation Reyes-Valdés, M.H. Burgueño, J. Sukhwinder-Singh Martínez-de la Vega, O. Sansaloni, C.P. AGRICULTURAL SCIENCES AND BIOTECHNOLOGY AGRICULTURAL SCIENCES AND BIOTECHNOLOGY WHEAT CLIMATE CHANGE ALLELES GERMPLASM BANKS PLANT BREEDING PLANT GENETICS GERMPLASM CONSERVATION Germplasm banks are growing in their importance, number of accessions and amount of characterization data, with a large emphasis on molecular genetic markers. In this work, we offer an integrated view of accessions and marker data in an information theory framework. The basis of this development is the mutual information between accessions and allele frequencies for molecular marker loci, which can be decomposed in allele specificities, as well as in rarity and divergence of accessions. In this way, formulas are provided to calculate the specificity of the different marker alleles with reference to their distribution across accessions, accession rarity, defined as the weighted average of the specificity of its alleles, and divergence, defined by the Kullback-Leibler formula. Albeit being different measures, it is demonstrated that average rarity and divergence are equal for any collection. These parameters can contribute to the knowledge of the structure of a germplasm collection and to make decisions about the preservation of rare variants. The concepts herein developed served as the basis for a strategy for core subset selection called HCore, implemented in a publicly available R script. As a proof of concept, the mathematical view and tools developed in this research were applied to a large collection of Mexican wheat accessions, widely characterized by SNP markers. The most specific alleles were found to be private of a single accession, and the distribution of this parameter had its highest frequencies at low levels of specificity. Accession rarity and divergence had largely symmetrical distributions, and had a positive, albeit non-strictly linear relationship. Comparison of the HCore approach for core subset selection, with three state-of-the-art methods, showed it to be superior for average divergence and rarity, mean genetic distance and diversity. The proposed approach can be used for knowledge extraction and decision making in germplasm collections of diploid, inbred or outbred species. 2019-02-09T01:10:11Z 2019-02-09T01:10:11Z 2018 Article ESSN: 1932-6203 https://hdl.handle.net/10883/19951 10.1371/journal.pone.0193346 English http://hdl.handle.net/11529/10013 https://ndownloader.figshare.com/articles/5935111/versions/1 https://hdl.handle.net/11529/10547952 CIMMYT manages Intellectual Assets as International Public Goods. The user is free to download, print, store and share this work. In case you want to translate or create any other derivative work and share or distribute such translation/derivative work, please contact CIMMYT-Knowledge-Center@cgiar.org indicating the work you want to use and the kind of use you intend; CIMMYT will contact you with the suitable license for that purpose. Open Access PDF San Francisco, Calif., U.S. Public Library of Science 2 13 e0193346 PLoS ONE
institution CIMMYT
collection DSpace
country México
countrycode MX
component Bibliográfico
access En linea
databasecode dig-cimmyt
tag biblioteca
region America del Norte
libraryname CIMMYT Library
language English
topic AGRICULTURAL SCIENCES AND BIOTECHNOLOGY
AGRICULTURAL SCIENCES AND BIOTECHNOLOGY
WHEAT
CLIMATE CHANGE
ALLELES
GERMPLASM BANKS
PLANT BREEDING
PLANT GENETICS
GERMPLASM CONSERVATION
AGRICULTURAL SCIENCES AND BIOTECHNOLOGY
AGRICULTURAL SCIENCES AND BIOTECHNOLOGY
WHEAT
CLIMATE CHANGE
ALLELES
GERMPLASM BANKS
PLANT BREEDING
PLANT GENETICS
GERMPLASM CONSERVATION
spellingShingle AGRICULTURAL SCIENCES AND BIOTECHNOLOGY
AGRICULTURAL SCIENCES AND BIOTECHNOLOGY
WHEAT
CLIMATE CHANGE
ALLELES
GERMPLASM BANKS
PLANT BREEDING
PLANT GENETICS
GERMPLASM CONSERVATION
AGRICULTURAL SCIENCES AND BIOTECHNOLOGY
AGRICULTURAL SCIENCES AND BIOTECHNOLOGY
WHEAT
CLIMATE CHANGE
ALLELES
GERMPLASM BANKS
PLANT BREEDING
PLANT GENETICS
GERMPLASM CONSERVATION
Reyes-Valdés, M.H.
Burgueño, J.
Sukhwinder-Singh
Martínez-de la Vega, O.
Sansaloni, C.P.
An informational view of accession rarity and allele specificity in germplasm banks for management and conservation
description Germplasm banks are growing in their importance, number of accessions and amount of characterization data, with a large emphasis on molecular genetic markers. In this work, we offer an integrated view of accessions and marker data in an information theory framework. The basis of this development is the mutual information between accessions and allele frequencies for molecular marker loci, which can be decomposed in allele specificities, as well as in rarity and divergence of accessions. In this way, formulas are provided to calculate the specificity of the different marker alleles with reference to their distribution across accessions, accession rarity, defined as the weighted average of the specificity of its alleles, and divergence, defined by the Kullback-Leibler formula. Albeit being different measures, it is demonstrated that average rarity and divergence are equal for any collection. These parameters can contribute to the knowledge of the structure of a germplasm collection and to make decisions about the preservation of rare variants. The concepts herein developed served as the basis for a strategy for core subset selection called HCore, implemented in a publicly available R script. As a proof of concept, the mathematical view and tools developed in this research were applied to a large collection of Mexican wheat accessions, widely characterized by SNP markers. The most specific alleles were found to be private of a single accession, and the distribution of this parameter had its highest frequencies at low levels of specificity. Accession rarity and divergence had largely symmetrical distributions, and had a positive, albeit non-strictly linear relationship. Comparison of the HCore approach for core subset selection, with three state-of-the-art methods, showed it to be superior for average divergence and rarity, mean genetic distance and diversity. The proposed approach can be used for knowledge extraction and decision making in germplasm collections of diploid, inbred or outbred species.
format Article
topic_facet AGRICULTURAL SCIENCES AND BIOTECHNOLOGY
AGRICULTURAL SCIENCES AND BIOTECHNOLOGY
WHEAT
CLIMATE CHANGE
ALLELES
GERMPLASM BANKS
PLANT BREEDING
PLANT GENETICS
GERMPLASM CONSERVATION
author Reyes-Valdés, M.H.
Burgueño, J.
Sukhwinder-Singh
Martínez-de la Vega, O.
Sansaloni, C.P.
author_facet Reyes-Valdés, M.H.
Burgueño, J.
Sukhwinder-Singh
Martínez-de la Vega, O.
Sansaloni, C.P.
author_sort Reyes-Valdés, M.H.
title An informational view of accession rarity and allele specificity in germplasm banks for management and conservation
title_short An informational view of accession rarity and allele specificity in germplasm banks for management and conservation
title_full An informational view of accession rarity and allele specificity in germplasm banks for management and conservation
title_fullStr An informational view of accession rarity and allele specificity in germplasm banks for management and conservation
title_full_unstemmed An informational view of accession rarity and allele specificity in germplasm banks for management and conservation
title_sort informational view of accession rarity and allele specificity in germplasm banks for management and conservation
publisher Public Library of Science
publishDate 2018
url https://hdl.handle.net/10883/19951
work_keys_str_mv AT reyesvaldesmh aninformationalviewofaccessionrarityandallelespecificityingermplasmbanksformanagementandconservation
AT burguenoj aninformationalviewofaccessionrarityandallelespecificityingermplasmbanksformanagementandconservation
AT sukhwindersingh aninformationalviewofaccessionrarityandallelespecificityingermplasmbanksformanagementandconservation
AT martinezdelavegao aninformationalviewofaccessionrarityandallelespecificityingermplasmbanksformanagementandconservation
AT sansalonicp aninformationalviewofaccessionrarityandallelespecificityingermplasmbanksformanagementandconservation
AT reyesvaldesmh informationalviewofaccessionrarityandallelespecificityingermplasmbanksformanagementandconservation
AT burguenoj informationalviewofaccessionrarityandallelespecificityingermplasmbanksformanagementandconservation
AT sukhwindersingh informationalviewofaccessionrarityandallelespecificityingermplasmbanksformanagementandconservation
AT martinezdelavegao informationalviewofaccessionrarityandallelespecificityingermplasmbanksformanagementandconservation
AT sansalonicp informationalviewofaccessionrarityandallelespecificityingermplasmbanksformanagementandconservation
_version_ 1787232958173675520