Learning Document Similarity Using Natural Language Processing
DOI:
https://doi.org/10.13092/lo.17.788Abstract
The recent considerable growth in the amount of easily available on-line text has brought to the foreground the need for large-scale natural language processing tools for text data mining. In this paper we address the problem of organizing documents into meaningful groups according to their content and to visualize a text collection, providing an overview of the range of documents and of their relationships, so that they can be browsed more easily. We use Self-Organizing Maps (SOMs) (Kohonen 1984). Great efficiency challenges arise in creating these maps. We study linguistically-motivated ways of reducing the representation of a document to increase efficiency and ways to disambiguate the words in the documents.Downloads
Download-Daten sind nocht nicht verfügbar.
Veröffentlicht
2003-12-31
Ausgabe
Rubrik
Artikel/Articles
Lizenz
Copyright (c) 2003 Paola Merlo, James Henderson, Gerold Schneider, Eric Wehrli
Dieses Werk steht unter der Lizenz Creative Commons Namensnennung 4.0 International.
Zitationsvorschlag
Merlo, P., Henderson, J., Schneider, G., & Wehrli, E. (2003). Learning Document Similarity Using Natural Language Processing. Linguistik Online, 17(5). https://doi.org/10.13092/lo.17.788