Discovering the spatial coverage of the documents through the SpatialCIM Methodology.

The main focus of this paper is to present the SpatialCIM methodology to identify the spatial coverage of the documents in the Brazilian geographic area. This methodology uses a linguistic tool to assist in the entity recognition process. The linguistic tool classifies the recognized entities as person, organization, time and localization, among others. The localization entities are checked using a geographic information system (GIS) in order to extract the Brazilian entity geographic paths. If there are multiple geographic paths for a single entity, the disambiguation process is carried out. This process attempts to locate the best geographic path for an entity considering all the geographic entities in the text. Another important objective of this paper is to show that the disambiguation process improves the geographic classification of the documents considering the obtained geographic paths. The validation process considers a set of news previously labeled by an expert and compared with the results of the disambiguated and non-disambiguated geographic paths. The results showed that the disambiguation process improves the classification compared with the classification without disambiguation. Keywords: Ambiguity problem resolution, spatial coverage identification, toponym resolution.

Saved in:
Bibliographic Details
Main Authors: VARGAS, R. N. P., REZENDE, S. de O., MOURA, M. F., SPERANZA, E. A., RODRIGUEZ, E.
Other Authors: ROSA NATHALIE PORTUGAL VARGAS, ICMC/USP; SOLANGE DE OLIVEIRA REZENDE, ICMC/USP; MARIA FERNANDA MOURA, CNPTIA; EDUARDO ANTONIO SPERANZA, CNPTIA; ERCILIA RODRIGUEZ.
Format: Anais e Proceedings de eventos biblioteca
Language:English
eng
Published: 2013-02-06
Subjects:Cobertura espacial, Ambiguidade, Ferramenta lingüística, Spatial coverage identification, Ambiguity problem resolution, Toponym resolution,
Online Access:http://www.alice.cnptia.embrapa.br/alice/handle/doc/948445
Tags: Add Tag
No Tags, Be the first to tag this record!