Tokenizer Adapted for the Nasa Yuwe Language

Sierra Martínez,Luz Marina; Cobos Lozada,Carlos Alberto; Corrales,Juan Carlos

Tokenizer Adapted for the Nasa Yuwe Language

Abstract. In Colombia, ethnic and cultural diversity is conceived by the government to be a social right. Such diversity finds expression, among other ways, in a large number of indigenous languages, which have been kept alive for centuries. However, efforts toward conservation and preservation of these languages have generally fallen short. This is the case for the Nasa Yuwe language, spoken by the Nasa, or Páez, indigenous community, the status of which is endangered. Given such a predicament, the use of technology has been found to provide a strategic opportunity for adaptation, ownership, and development of Nasa Yuwe within the social and cultural environment of the Nasa people. The technology includes the use of computational techniques, which allow the exchange of information by means of IR activities. These encourage different, new possibilities for the Nasa people to be able to interact in Nasa Yuwe. It has therefore become necessary to adapt the stages of the IR process to this language. The current paper specifically presents a process for adapting a tokenizer to texts written in Nasa Yuwe. This involves making use of the precision-recall curve as an evaluation and comparison measure. The results presented allow appreciation of all stages in the process of adapting the standard tokenizer to produce the Nasa version, of the Nasa tokenizer and its results over texts written in Nasa Yuwe, and of the analysis of the precision-recall curve baseline in contrast to that of the Nasa tokenizer.

Saved in:

Bibliographic Details
Main Authors:	Sierra Martínez,Luz Marina, Cobos Lozada,Carlos Alberto, Corrales,Juan Carlos
Format:	Digital revista
Language:	English
Published:	Instituto Politécnico Nacional, Centro de Investigación en Computación 2016
Online Access:	http://www.scielo.org.mx/scielo.php?script=sci_arttext&pid=S1405-55462016000300355
Tags:	Add Tag No Tags, Be the first to tag this record!

id	oai:scielo:S1405-55462016000300355
record_format	ojs
spelling	oai:scielo:S1405-554620160003003552017-10-23Tokenizer Adapted for the Nasa Yuwe LanguageSierra Martínez,Luz MarinaCobos Lozada,Carlos AlbertoCorrales,Juan Carlos Nasa indigenous community Nasa Yuwe language tokenizer for Nasa Yuwe information retrieval for texts written in Nasa Yuwe Abstract. In Colombia, ethnic and cultural diversity is conceived by the government to be a social right. Such diversity finds expression, among other ways, in a large number of indigenous languages, which have been kept alive for centuries. However, efforts toward conservation and preservation of these languages have generally fallen short. This is the case for the Nasa Yuwe language, spoken by the Nasa, or Páez, indigenous community, the status of which is endangered. Given such a predicament, the use of technology has been found to provide a strategic opportunity for adaptation, ownership, and development of Nasa Yuwe within the social and cultural environment of the Nasa people. The technology includes the use of computational techniques, which allow the exchange of information by means of IR activities. These encourage different, new possibilities for the Nasa people to be able to interact in Nasa Yuwe. It has therefore become necessary to adapt the stages of the IR process to this language. The current paper specifically presents a process for adapting a tokenizer to texts written in Nasa Yuwe. This involves making use of the precision-recall curve as an evaluation and comparison measure. The results presented allow appreciation of all stages in the process of adapting the standard tokenizer to produce the Nasa version, of the Nasa tokenizer and its results over texts written in Nasa Yuwe, and of the analysis of the precision-recall curve baseline in contrast to that of the Nasa tokenizer.info:eu-repo/semantics/openAccessInstituto Politécnico Nacional, Centro de Investigación en ComputaciónComputación y Sistemas v.20 n.3 20162016-09-01info:eu-repo/semantics/articletext/htmlhttp://www.scielo.org.mx/scielo.php?script=sci_arttext&pid=S1405-55462016000300355en10.13053/cys-20-3-2455
institution	SCIELO
collection	OJS
country	México
countrycode	MX
component	Revista
access	En linea
databasecode	rev-scielo-mx
tag	revista
region	America del Norte
libraryname	SciELO
language	English
format	Digital
author	Sierra Martínez,Luz Marina Cobos Lozada,Carlos Alberto Corrales,Juan Carlos
spellingShingle	Sierra Martínez,Luz Marina Cobos Lozada,Carlos Alberto Corrales,Juan Carlos Tokenizer Adapted for the Nasa Yuwe Language
author_facet	Sierra Martínez,Luz Marina Cobos Lozada,Carlos Alberto Corrales,Juan Carlos
author_sort	Sierra Martínez,Luz Marina
title	Tokenizer Adapted for the Nasa Yuwe Language
title_short	Tokenizer Adapted for the Nasa Yuwe Language
title_full	Tokenizer Adapted for the Nasa Yuwe Language
title_fullStr	Tokenizer Adapted for the Nasa Yuwe Language
title_full_unstemmed	Tokenizer Adapted for the Nasa Yuwe Language
title_sort	tokenizer adapted for the nasa yuwe language
description	Abstract. In Colombia, ethnic and cultural diversity is conceived by the government to be a social right. Such diversity finds expression, among other ways, in a large number of indigenous languages, which have been kept alive for centuries. However, efforts toward conservation and preservation of these languages have generally fallen short. This is the case for the Nasa Yuwe language, spoken by the Nasa, or Páez, indigenous community, the status of which is endangered. Given such a predicament, the use of technology has been found to provide a strategic opportunity for adaptation, ownership, and development of Nasa Yuwe within the social and cultural environment of the Nasa people. The technology includes the use of computational techniques, which allow the exchange of information by means of IR activities. These encourage different, new possibilities for the Nasa people to be able to interact in Nasa Yuwe. It has therefore become necessary to adapt the stages of the IR process to this language. The current paper specifically presents a process for adapting a tokenizer to texts written in Nasa Yuwe. This involves making use of the precision-recall curve as an evaluation and comparison measure. The results presented allow appreciation of all stages in the process of adapting the standard tokenizer to produce the Nasa version, of the Nasa tokenizer and its results over texts written in Nasa Yuwe, and of the analysis of the precision-recall curve baseline in contrast to that of the Nasa tokenizer.
publisher	Instituto Politécnico Nacional, Centro de Investigación en Computación
publishDate	2016
url	http://www.scielo.org.mx/scielo.php?script=sci_arttext&pid=S1405-55462016000300355
work_keys_str_mv	AT sierramartinezluzmarina tokenizeradaptedforthenasayuwelanguage AT coboslozadacarlosalberto tokenizeradaptedforthenasayuwelanguage AT corralesjuancarlos tokenizeradaptedforthenasayuwelanguage
_version_	1756225770292772864

Tokenizer Adapted for the Nasa Yuwe Language

Similar Items

Resource Map