Improving clustering with metabolic pathway data

Milone, Diego Humberto; Stegmayer, Georgina; Lopez, Mariana Gabriela; Kamenetzky, Laura; Carrari, Fernando

Improving clustering with metabolic pathway data

Background: It is a common practice in bioinformatics to validate each group returned by a clustering algorithm through manual analysis, according to a-priori biological knowledge. This procedure helps finding functionally related patterns to propose hypotheses for their behavior and the biological processes involved. Therefore, this knowledge is used only as a second step, after data are just clustered according to their expression patterns. Thus, it could be very useful to be able to improve the clustering of biological data by incorporating prior knowledge into the cluster formation itself, in order to enhance the biological value of the clusters. Results: A novel training algorithm for clustering is presented, which evaluates the biological internal connections of the data points while the clusters are being formed. Within this training algorithm, the calculation of distances among data points and neurons centroids includes a new term based on information from well-known metabolic pathways. The standard self-organizing map (SOM) training versus the biologically-inspired SOM (bSOM) training were tested with two real data sets of transcripts and metabolites from Solanum lycopersicum and Arabidopsis thaliana species. Classical data mining validation measures were used to evaluate the clustering solutions obtained by both algorithms. Moreover, a new measure that takes into account the biological connectivity of the clusters was applied. The results of bSOM show important improvements in the convergence and performance for the proposed clustering method in comparison to standard SOM training, in particular, from the application point of view. Conclusions: Analyses of the clusters obtained with bSOM indicate that including biological information during training can certainly increase the biological value of the clusters found with the proposed method. It is worth to highlight that this fact has effectively improved the results, which can simplify their further analysis.

Saved in:

Bibliographic Details
Main Authors:	Milone, Diego Humberto, Stegmayer, Georgina, Lopez, Mariana Gabriela, Kamenetzky, Laura, Carrari, Fernando
Format:	info:ar-repo/semantics/artículo biblioteca
Language:	eng
Published:	BMC 2014-04
Subjects:	Bioinformática, Datos, Bioinformatics, Data, Agrupamiento, Clustering,
Online Access:	https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-15-101 http://hdl.handle.net/20.500.12123/4292 https://doi.org/10.1186/1471-2105-15-101
Tags:	Add Tag No Tags, Be the first to tag this record!

id	oai:localhost:20.500.12123-4292
record_format	koha
spelling	oai:localhost:20.500.12123-42922019-01-18T12:51:31Z Improving clustering with metabolic pathway data Milone, Diego Humberto Stegmayer, Georgina Lopez, Mariana Gabriela Kamenetzky, Laura Carrari, Fernando Bioinformática Datos Bioinformatics Data Agrupamiento Clustering Background: It is a common practice in bioinformatics to validate each group returned by a clustering algorithm through manual analysis, according to a-priori biological knowledge. This procedure helps finding functionally related patterns to propose hypotheses for their behavior and the biological processes involved. Therefore, this knowledge is used only as a second step, after data are just clustered according to their expression patterns. Thus, it could be very useful to be able to improve the clustering of biological data by incorporating prior knowledge into the cluster formation itself, in order to enhance the biological value of the clusters. Results: A novel training algorithm for clustering is presented, which evaluates the biological internal connections of the data points while the clusters are being formed. Within this training algorithm, the calculation of distances among data points and neurons centroids includes a new term based on information from well-known metabolic pathways. The standard self-organizing map (SOM) training versus the biologically-inspired SOM (bSOM) training were tested with two real data sets of transcripts and metabolites from Solanum lycopersicum and Arabidopsis thaliana species. Classical data mining validation measures were used to evaluate the clustering solutions obtained by both algorithms. Moreover, a new measure that takes into account the biological connectivity of the clusters was applied. The results of bSOM show important improvements in the convergence and performance for the proposed clustering method in comparison to standard SOM training, in particular, from the application point of view. Conclusions: Analyses of the clusters obtained with bSOM indicate that including biological information during training can certainly increase the biological value of the clusters found with the proposed method. It is worth to highlight that this fact has effectively improved the results, which can simplify their further analysis. Instituto de Biotecnología Fil: Milone, Diego Humberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina Fil: Stegmayer, Georgina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina Fil: Lopez, Mariana Gabriela. Instituto Nacional de Tecnología Agropecuaria (INTA). Instituto de Biotecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Fil: Kamenetzky, Laura. Instituto Nacional de Tecnología Agropecuaria (INTA). Instituto de Biotecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Fil: Carrari, Fernando Oscar. Instituto Nacional de Tecnología Agropecuaria (INTA). Instituto de Biotecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. 2019-01-18T12:45:32Z 2019-01-18T12:45:32Z 2014-04 info:ar-repo/semantics/artículo info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-15-101 http://hdl.handle.net/20.500.12123/4292 1471-2105 https://doi.org/10.1186/1471-2105-15-101 eng info:eu-repo/semantics/openAccess application/pdf BMC BMC Bioinformatics 15 : 101 (2014)
institution	INTA AR
collection	DSpace
country	Argentina
countrycode	AR
component	Bibliográfico
access	En linea
databasecode	dig-inta-ar
tag	biblioteca
region	America del Sur
libraryname	Biblioteca Central del INTA Argentina
language	eng
topic	Bioinformática Datos Bioinformatics Data Agrupamiento Clustering Bioinformática Datos Bioinformatics Data Agrupamiento Clustering
spellingShingle	Bioinformática Datos Bioinformatics Data Agrupamiento Clustering Bioinformática Datos Bioinformatics Data Agrupamiento Clustering Milone, Diego Humberto Stegmayer, Georgina Lopez, Mariana Gabriela Kamenetzky, Laura Carrari, Fernando Improving clustering with metabolic pathway data
description	Background: It is a common practice in bioinformatics to validate each group returned by a clustering algorithm through manual analysis, according to a-priori biological knowledge. This procedure helps finding functionally related patterns to propose hypotheses for their behavior and the biological processes involved. Therefore, this knowledge is used only as a second step, after data are just clustered according to their expression patterns. Thus, it could be very useful to be able to improve the clustering of biological data by incorporating prior knowledge into the cluster formation itself, in order to enhance the biological value of the clusters. Results: A novel training algorithm for clustering is presented, which evaluates the biological internal connections of the data points while the clusters are being formed. Within this training algorithm, the calculation of distances among data points and neurons centroids includes a new term based on information from well-known metabolic pathways. The standard self-organizing map (SOM) training versus the biologically-inspired SOM (bSOM) training were tested with two real data sets of transcripts and metabolites from Solanum lycopersicum and Arabidopsis thaliana species. Classical data mining validation measures were used to evaluate the clustering solutions obtained by both algorithms. Moreover, a new measure that takes into account the biological connectivity of the clusters was applied. The results of bSOM show important improvements in the convergence and performance for the proposed clustering method in comparison to standard SOM training, in particular, from the application point of view. Conclusions: Analyses of the clusters obtained with bSOM indicate that including biological information during training can certainly increase the biological value of the clusters found with the proposed method. It is worth to highlight that this fact has effectively improved the results, which can simplify their further analysis.
format	info:ar-repo/semantics/artículo
topic_facet	Bioinformática Datos Bioinformatics Data Agrupamiento Clustering
author	Milone, Diego Humberto Stegmayer, Georgina Lopez, Mariana Gabriela Kamenetzky, Laura Carrari, Fernando
author_facet	Milone, Diego Humberto Stegmayer, Georgina Lopez, Mariana Gabriela Kamenetzky, Laura Carrari, Fernando
author_sort	Milone, Diego Humberto
title	Improving clustering with metabolic pathway data
title_short	Improving clustering with metabolic pathway data
title_full	Improving clustering with metabolic pathway data
title_fullStr	Improving clustering with metabolic pathway data
title_full_unstemmed	Improving clustering with metabolic pathway data
title_sort	improving clustering with metabolic pathway data
publisher	BMC
publishDate	2014-04
url	https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-15-101 http://hdl.handle.net/20.500.12123/4292 https://doi.org/10.1186/1471-2105-15-101
work_keys_str_mv	AT milonediegohumberto improvingclusteringwithmetabolicpathwaydata AT stegmayergeorgina improvingclusteringwithmetabolicpathwaydata AT lopezmarianagabriela improvingclusteringwithmetabolicpathwaydata AT kamenetzkylaura improvingclusteringwithmetabolicpathwaydata AT carrarifernando improvingclusteringwithmetabolicpathwaydata
_version_	1756007367123664896

Improving clustering with metabolic pathway data

Similar Items

Resource Map